Popular version of paper 2pPPb3
Presented Wednesday afternoon, May 31, 2000
139th ASA Meeting, Atlanta, GA
Introduction
The human bodys sensory capabilities have proven to be difficult to equal with engineered systems. Speech understanding is no exception to this rule; human abilities of deciphering speech in degraded environments far exceed those of computers. The intent of this research project is to model human binaural hearing abilities to see if reverberant speech recordings might be made more understandable. The first step in this process is investigating the abilities humans have to remove reverberation from speech.
Background
The sound heard in a room consists of a "direct signal" plus hundreds of thousands of echoes, these echoes, the result of speech reflecting off of the walls, floor, and ceiling, usually arrive before the original speech sound has stopped. Eventually, after many reflections the absorption by surfaces and objects in the room reduce the amplitude of these echoes to a point where they are no longer audible. The time it takes a sound to "ring down" to this inaudible level is called "reverberation time." Reverberation time is one way of describing the severity of a room's reverberation. The greater the reverberation time the more difficult it is to understand speech in a room.
The following words will demonstrate the effect of echoes on speech.
Words Without Echoes
Words recorded in an echoic room
Words with simulated echoes
In large auditoriums we are generally aware of reverberation of speech, yet in smaller rooms, where there may be many more echoes, the reverberation seems to go unnoticed. To examine why this phenomenon might occur we assume that the brain contains different processing regions for speech and non-speech sounds. It is possible that the speech portion of the brain is capable of understanding reverberant speech and converting sounds to words. In this way the acoustics of the speech is thrown away and replaced by a representation of the word, thereby perceptually ignoring the reverberation. It is also conceivable that somewhere before this speech-processing region that echoes are "filtered" out, thereby delivering a clean speech signal to the speech processor. Either of the above two possibilities might explain why we are frequently unaware of reverberation. It is in the case of extremely high levels of reverberation when either of the two above processes breaks down, that the non-speech-processing portion of the brain interprets the speech as reverberant.
If the brain is capable of echo removal then these processes might be
modeled to remove echoes from recordings. In fact, various neurological
processing models have been proposed that seem useful in explaining how
these echoes might be removed. These models consider differences
between the sounds received by the left and right ears, and have traditionally
been used to explain sound localization, selective attention to a speaker
at a party, and the precedence effect. The precedence effect is particularly
interesting because it permits localization of a sound source in echoic
rooms based on the sound that arrives directly; this problem is strikingly
similar to the dereverberation problem. In order to test the possibility
that these models are capable of echo removal, comparisons need to be made
to actual human performance. Since intelligibility data is not available,
tests are currently in progress to determine what abilities humans have
to understand reverberant speech signals.
Intelligibility Testing
Since we appear to be so adept at understanding speech even in degraded environments, a single syllable word test was chosen to make the test difficult without resorting to unrealistic reverberation times. The test is based on a standard procedure, ANSI S3.2, which uses 20 lists each containing 50 words that represent the average phonetic usage of conversational English. Subjects listen to these words in reverberant rooms and repeat what they heard. Their responses are then compared to the original words to obtain an intelligibility score. In addition, the words are also subjected to simulated reverberation generated by the computer and then played over headphones to the listeners. Simulated reverberation can be precisely controlled, is reproducible, and for these reasons is preferable to testing in actual rooms.
These tests address several specific questions, the first being does reverberation time affect intelligibility? Secondly does the use of two ears improve understanding? And lastly is there something special about reverberation over other forms of noise that facilitates neurological processing.
Applications and Future Work
If humans have capabilities of removing echoes from speech, and these
processes can be modeled, engineers will have a new tool for removing reverberation
from speech recordings. This tool could then be used in hearing aids
to provide clean speech signals for people with hearing loss in one ear
or neurological auditory impairments. Applications also exist in
teleconferencing, telecommunications, and audio recording. If humans
are not capable of removing echoes from speech, and are instead capable
of recognizing reverberant speech, this result would be useful in the design
of speech recognition systems.