5aPA – A Robust Smartphone Based Multi-Channel Dynamic-Range Audio Compression for Hearing Aids
Yiya Hao– yxh133130@utdallas.edu
Ziyan Zou – ziyan.zou@utdallas.edu
Dr. Issa M S Panahi – imp015000@utdallas.edu
Statistical Signal Processing Laboratory (SSPRL)
The University of Texas at Dallas
800W Campbell Road, Richardson, TX – 75080, USA
Popular Version of Paper 5aPA, “A Robust Smartphone Based Multi-Channel Dynamic-Range Audio Compression for Hearing Aids”
Presented Friday morning, May 11, 2018, 10:15 – 10:30 AM, GREENWAY J
175th ASA Meeting, Minneapolis
Records by National Institute on Deafness and Other Communication Disorders (NIDCD) indicate that nearly 15% of adults (37 million) aged 18 and over report some kind of hearing loss in the United States. Amongst the entire world population, 360 million people suffer from hearing loss.
Hearing impairment degrades perception of speech and audio signals due to low frequency- dependent audible threshold levels. Hearing aid devices (HADs) apply prescription gains and dynamic-range compression for improving users’ audibility without increasing the sound loudness to uncomfortable levels. Multi-Channel dynamic-range compression enhances quality and intelligibility of audio output by targeting each frequency band with different compression parameters such as compression ratio (CR), attack time (AT) and release time (RT).
Increasing the number of compression channels can result in more comfortable audio output when appropriate parameters are defined for each channel. However, the use of more channels increases computational complexity of the multi-channel compression algorithm limiting its application to some HADs. In this paper, we propose a nine-channel dynamic-range compression (DRC) with an optimized structure capable of running on smartphones and other portable digital platforms in real time. Test results showing the performance of proposed method are presented too. The block diagram of proposed method shows in Fig.1. And the block diagram of compressor shows in the Fig.2.
Several experimental results have been measured including the processing time measurements of real-time implementation of proposed method on an Android smartphone, objective evaluations and subjective evaluations, a commercial audio compression & limiter provided by Hotto Engineering [1] is used as a comparison running on a laptop. Proposed method running on a Google Pixel smartphone with operating system 6.0.1. The sampling rate is set to 16kHz and the frame size is set as 10 ms.
The High-quality INT eractomes (HINT) sentences database at 16 kHz sampling rate are used. First experimental measurement is testing the processing time running on the smartphone. Two processing times were measured, round-trip latency and algorithms processing time. Larsen test was used to measure the round-trip latency [2], and the test setup shows in Fig.3. The average processing time results shows in Fig.2 as well. Perceptual evaluation of speech quality (PESQ) [3] and short-time objective intelligibility (STOI) [4] has been used to test the objective quality and intelligibility of proposed nine-channel DRC.
The results could be find in Fig.4. Subjective tests including mean opinion score (MOS) test [5] and word recognition test (WR) have been tested, and the Fig.5 shows the results. Based on the results we can tell that proposed nine-channel DRC could run on the smartphone efficiently, and provides with decent quality and intelligibility as well.
Based on the results we can tell, proposed nine-channel dynamic-range audio compression could provide with decent the quality and intelligibility which could run on smartphones. Proposed DRC could pre-set all the parameters based on the audiograms of individuals. With proposed compression, the multi-channel DRC does not limit within advanced hardware, which is costly such as hearing aids or laptops. Proposed method also provides with a portable audio framework, which not just limiting in current version of DRC, but could be extended or upgraded further for research study.
Please refer our lab website http://www.utdallas.edu/ssprl/hearing-aid-project/ for video demos and the sample audio files are as attached below.
Audio files:
Unprocessed_MaleSpeech.wav
Unprocessed_FemaleSpeech.wav
Unprocessed_Song.wav
Processed_MaleSpeech.wav
Processed_FemaleSpeech.wav
Processed_Song.wav
Key References:
- 2018. [Online]. Available: http://www.hotto.de/
- 2018. [Online]. Available: https://source.android.com/devices/audio/latency_measurements
- Rix, W., J. G. Beerends J.G., Hollier, M. P., Hekstra, A. P., “Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,” IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), 2, pp. 749-752., May 2001.
- Tall, C. H, Hendricks, R. C., Heusdens, R., Jensen, R., “An algorithm for intelligibility prediction of time-frequency weighted noisy speech,” IEEE trans. Audio, Speech, Lang. Process. 19(7), pp. 2125- 2136., Feb
- Streijl, R. C., Winkler, S., Hands, D. S., “Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives,” in Multimedia Systems 22.2, pp. 213-227, 2016.
*This work was supported by the National Institute of the Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health (NIH) under the grant number 5R01DC015430-02. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors are with the Statistical Signal Processing Research Laboratory (SSPRL), Department of Electrical and Computer Engineering, The University of Texas at Dallas.