ASA PRESSROOM


Acoustical Society of America
157th Meeting Lay Language Papers


[ Lay Language Paper Index | Press Room ]


How Visual Cues Help us Understand Speech in a Complex Environment, and
Auditory Attention and the Active Listener

Barbara Shinn-Cunningham - shinn@cns.bu.edu
Lingqiang Kong - konglq@cns.bu.edu
Auditory Neuroscience Laboratory,
Boston University
Boston, MA 02215

Popular version of papers 3pID2 and 4aPP8
Presented Thursday morning, May 21, 2009 and Wednesday afternoon, May 20, 2009
157th ASA Meeting, Portland, OR

In many social settings, such as a busy restaurant, there are multiple sounds coming into the ears from all directions, competing for our attention. Normal-hearing, young, healthy listeners are good at focusing on whatever source they are interested in (like the waiter and busboy gossiping over the chefs love life) and ignoring other uninteresting or annoying sounds (like your blind date droning on about the weather, the snob at the next table grousing that his filet mignon is overcooked, etc.) The ways in which the brain allows us to perform this dazzling feat of selective attention are still not well understood. Moreover, many people have trouble with selective auditory attention, including people with even modest hearing loss or aging, so understanding the ways in which normal-hearing, young listeners accomplish this task has real-world application to helping develop improved hearing aids and listening devices. Here, we show that in social settings, visual cues help us focus selective attention in multiple, distinct ways.

We know from other studies from our laboratory that in complex auditory scenes, knowledge of what distinguishes a target from other signals is important for enabling selective attention. For instance, try the following demonstration. Listen carefully ONCE to the following sound mixture for the metallic, robotic male voice, and see if you can hear the phone number. Listening more than once is cheating... (and so is reading ahead to read the full explanation of the demonstration!).

Ready? Listen for the metallic, robotic male voice reading the phone number in this sound example: LISTEN

Can you report the phone number? The correct answer is 353-4342 (here is that male robot all by itself: LISTEN).

Most young, normal-hearing listeners have no problem with this selective attention task.

But there is a trick: What, if anything, can you recall about the competing sound from that mixture? Most people cannot remember much of anything -- maybe, at most, they know it was a female talker. You were able to selectively hear the male voice because you knew to listen for a male voice, and it started first. But the very act of selectively listening to it caused the other (female) voice to be ignored.

Now listen again to the mixture, but listen for the female voice, which starts a little later, this time: LISTEN

Even with the identical sound file, most listeners have no trouble hearing the female voice and understanding it. Moreover, in many everyday situations, multiple sounds are completely understandable, just like the two voices were each understandable in this demonstration. However, we usually really only process one thing at a time (as the talker says herself: LISTEN).

Determining which thing to selectively attend is the limiting factor in many everyday settings.

The current study examined how visual cues may help listeners in a selective auditory attention task. For instance, at a cocktail party, visual cues may help a listener by showing them where or when to direct attention, which elements of sound are coming from a source of interest, and/or what speech sounds the person who is talking is making (via lip reading). We investigated speech intelligibility of a desired target sound while varying the visual cues available in a complex, confusing auditory scene to try to determine which of these aspects of visual information are helpful for directing selective attention, and how the importance of the visual information provided to the listener changes with the complexity of the auditory scene.

Because of the way we constructed the simulated auditory scene in our experiments, like in many real-world social settings, the main problem limiting performance was not that the target speech was inaudible, but rather that there were too many things going on at once (attentional capacity was the limiting factor, rather than audibility, just like in the earlier sound examples). Specifically, our subjects listened for a target utterance in the presence of multiple masker utterances with similar grammatical structure, spoken by the same talker, coming from different directions. The timing and direction of the target (and maskers) varied randomly, so that, without visual cues, our listeners were often not sure of where or when to focus auditory attention.

We found that performance tended to improve as the amount of visual information increased. Specifically, we found that knowing just when or where to listen was helpful, even if the visual cues were static images. This is interesting in that most past studies exploring the importance of visual cues on understanding speech have focused on lip reading. Listeners see the target talker moving his or her mouth to form the target speech utterance. These past studies show that lip reading information helps listeners understand speech, particularly when the speech is just at the edge of audibility (so that some portions of the speech are impossible to hear and inaudible). The advantages of lip reading are thought to arise because the talkers tongue, lip, and mouth movements provide explicit knowledge to the listener about what the talker said. In our study, we found evidence for benefits of lip reading, but we also found benefits that were even simpler: a static visual cue for when the target occurred improved performance over knowing only where it was coming from, and a cue telling listeners where to listen helped even if it didnt tell them anything about when the target occurred.

These results show that visual cues can help us communicate in everyday settings not just by allowing us to lip read, but also by showing us where and when to direct selective attention. This study demonstrates that there are many different ways that visual cues provide information that helps us converse in social settings like at a cocktail party or in a boardroom meeting.


[ Lay Language Paper Index | Press Room ]