ASA PRESSROOM

Acoustical Society of America
139th Meeting Lay Language Papers


Human Capabilities of Echo Removal

Brad Libbey - gt1556a@prism.gatech.edu
Dr. Peter Rogers
Georgia Institute of Technology
School of Mechanical Engineering
Graduate Box 268
Atlanta, GA 30332-0405
(404) 892 - 6290

Popular version of paper 2pPPb3
Presented Wednesday afternoon, May 31, 2000
139th ASA Meeting, Atlanta, GA

Introduction

The human bodys sensory capabilities have proven to be difficult to equal with engineered systems. Speech understanding is no exception to this rule; human abilities of deciphering speech in degraded environments far exceed those of computers. The intent of this research project is to model human binaural hearing abilities to see if reverberant speech recordings might be made more understandable. The first step in this process is investigating the abilities humans have to remove reverberation from speech.

Background

The sound heard in a room consists of a "direct signal" plus hundreds of thousands of echoes, these echoes, the result of speech reflecting off of the walls, floor, and ceiling, usually arrive before the original speech sound has stopped. Eventually, after many reflections the absorption by surfaces and objects in the room reduce the amplitude of these echoes to a point where they are no longer audible. The time it takes a sound to "ring down" to this inaudible level is called "reverberation time." Reverberation time is one way of describing the severity of a room's reverberation. The greater the reverberation time the more difficult it is to understand speech in a room.

The following words will demonstrate the effect of echoes on speech.
Words Without Echoes
Words recorded in an echoic room
Words with simulated echoes

In large auditoriums we are generally aware of reverberation of speech, yet in smaller rooms, where there may be many more echoes, the reverberation seems to go unnoticed. To examine why this phenomenon might occur we assume that the brain contains different processing regions for speech and non-speech sounds. It is possible that the speech portion of the brain is capable of understanding reverberant speech and converting sounds to words. In this way the acoustics of the speech is thrown away and replaced by a representation of the word, thereby perceptually ignoring the reverberation. It is also conceivable that somewhere before this speech-processing region that echoes are "filtered" out, thereby delivering a clean speech signal to the speech processor. Either of the above two possibilities might explain why we are frequently unaware of reverberation. It is in the case of extremely high levels of reverberation when either of the two above processes breaks down, that the non-speech-processing portion of the brain interprets the speech as reverberant.

If the brain is capable of echo removal then these processes might be modeled to remove echoes from recordings. In fact, various neurological processing models have been proposed that seem useful in explaining how these echoes might be removed. These models consider differences between the sounds received by the left and right ears, and have traditionally been used to explain sound localization, selective attention to a speaker at a party, and the precedence effect. The precedence effect is particularly interesting because it permits localization of a sound source in echoic rooms based on the sound that arrives directly; this problem is strikingly similar to the dereverberation problem. In order to test the possibility that these models are capable of echo removal, comparisons need to be made to actual human performance. Since intelligibility data is not available, tests are currently in progress to determine what abilities humans have to understand reverberant speech signals.

Intelligibility Testing

Since we appear to be so adept at understanding speech even in degraded environments, a single syllable word test was chosen to make the test difficult without resorting to unrealistic reverberation times. The test is based on a standard procedure, ANSI S3.2, which uses 20 lists each containing 50 words that represent the average phonetic usage of conversational English. Subjects listen to these words in reverberant rooms and repeat what they heard. Their responses are then compared to the original words to obtain an intelligibility score. In addition, the words are also subjected to simulated reverberation generated by the computer and then played over headphones to the listeners. Simulated reverberation can be precisely controlled, is reproducible, and for these reasons is preferable to testing in actual rooms.

These tests address several specific questions, the first being does reverberation time affect intelligibility? Secondly does the use of two ears improve understanding? And lastly is there something special about reverberation over other forms of noise that facilitates neurological processing.

Applications and Future Work

If humans have capabilities of removing echoes from speech, and these processes can be modeled, engineers will have a new tool for removing reverberation from speech recordings. This tool could then be used in hearing aids to provide clean speech signals for people with hearing loss in one ear or neurological auditory impairments. Applications also exist in teleconferencing, telecommunications, and audio recording. If humans are not capable of removing echoes from speech, and are instead capable of recognizing reverberant speech, this result would be useful in the design of speech recognition systems.


[ Lay Language Paper Index | Press Room ]