151st ASA Meeting, Providence, RI

[ Lay Language Paper Index | Press Room ]

What Was That Snap in the Grass?

Brian Gygi- bgygi@ebire.org
Valeriy Shafiro
Acoustics Research Institute
Vienna, Austria

Popular version of paper 3aPP16
Presented Wednesday Morning, June 7, 2006
151st ASA Meeting, Providence, RI

Listen to this sound. Now imagine that you are in the movies, walking through the forest with some hidden threat nearby. Then suddenly that sound occurs. This should stop your breath and make your heart pound, because it is clearly the footstep of a villain or a predator. However, if you are working in your office and you hear the same sound, it will likely cause amusement or annoyance because it is probably one of your coworkers crumpling up a piece of paper. Listen to the sound again by itself and in both backgrounds and guess which one it is more likely to be. If you guessed the paper crumpling, you are correct.

This demonstrates the effects of context on our ability to identify important sounds in our environment. In a forest setting, where someone is unlikely to be crumpling a piece of paper, it is reasonable to assume it is the footstep of an enemy. In an office setting, where hopefully your enemies are not stalking you, a crumpled piece of paper is your best guess. The process of deciding which sounds are important and should be attended to and which should be relegated to the background is a complicated process and it is remarkable that it happens as immediately as it does. Our expectations for what kinds of sounds are likely to occur interact with what we actually hear. In the vision literature, the effects of expectations are well known, as shown in the famous optical illusion of the picture that can be an old woman or a young one depending on expectations.

The effects of context in an auditory setting have not been nearly so well studied, and the results have not been as straightforward. For this study we mixed familiar, easily identifiable environmental sounds such as a person laughing into a common auditory scene that was congruent, or typical (a bowling alley) or incongruent or atypical (a fire). (If you click on the links, you will hear the laugh mixed in each of the scenes, preceded and followed by two short dings.). More examples of the sound-scene combinations are included below. We had two groups of listeners identify the sounds. One of the groups had been trained on both the sounds and the scenes in isolation, and another had never heard either before. We fully expected that at least one of the groups would show an advantage for sounds in congruent scenes.

The results were somewhat surprising. The experienced listeners identified the sounds that were in incongruent scenes slightly but significantly better that the sounds in the congruent scenes. One possible reason for this seems fairly obvious from listening to the stimuli. A laugh is fairly common in a bowling alley, so this particular laugh could easily just be part of the background. Whereas a laugh is not common during a fire, so it jumps out at the listener.

However for the inexperienced listeners, overall there was no difference between the two conditions. When the sounds were mixed very quietly in the scenes, so that the task was quite difficult, the inexperienced listeners identified the sounds in congruent scenes slightly better.

These data imply that the relationship of background to foreground sounds is easily characterized. It seems that when the listeners are familiar with the sounds and settings, it is the sounds they dont expect that are easier to recognize. This makes some evolutionary sense, because it is when something is not normal (such as a lion is approaching) that we need to pay attention. However, when they are unfamiliar with the sounds and settings, knowledge of what sounds are likely to occur can help to identify the sounds that are difficult to hear.

[ Lay Language Paper Index | Press Room ]