William A. Yost – william.yost@asu.edu
Michael Torben Pastore – m.torben.pastore@gmail,edu

Speech and Hearing Science
Arizona State University
PO Box 870102
Tempe AZ, 85287-0102

Popular version of paper 4aPP7
Presented Thursday morning, May 10, 2018
175th ASA Meeting, Minneapolis, MN

This paper is part of special session honoring Dr. Neil Viemeister, University of Minnesota, for his brilliant career. One of the topics Dr. Viemeister studies is loudness perception. Our presentation deals with the perceived loudness of an auditory scene when several people talk at about the same time. In the real world, the sounds of all the talkers are combined into one complex sound before they reach a listener’s ears. The auditory brain sorts this single complex sound into acoustic “images“, where each image represents the sound of one of the talkers. In our research, we try to understand how many such images can be ”pulled out” of an auditory scene so that they are perceived as separate, identifiable talkers.

In one type of simple experiment listeners are asked to determine how many more talkers it takes for listeners to notice that the number of talkers has increased. When we increase the number of talkers, the additional talkers make the overall sound louder and the change in loudness can be used as a cue to help listeners discriminate which sound has more talkers. If we make the overall loudness of a four-talker scene (as an example) and a six-talker scene (as an example) the same, the loudness of the individual talkers in the six-talker scene will be less than the loudness of the individual talkers in the four-talker scene.

If listeners can focus on the individual talkers in the two scenes, they might be able to use the change in loudness of individual talkers as a cue for discrimination. If listeners cannot focus on individual talkers in a scene, then the two scenes may not be discriminable and they are likely to be judged as equally loud. We have found that listeners can make loudness judgments of the individual talkers for scenes of two or three talkers, but not more. This indicates that the loudness of a complex sound may depend on how well the individual components of the sound are perceived and, if so, that only two or three such components (images, talkers) can be processed by the auditory brain at a given time.

Trying to listen to one or more people in a situation of many people talking at the same time is difficult, especially for people who are hard of hearing. If the normal auditory system can only process a few sound sources presented at the same time, this reduces the complexity of devices (e.g., hearing aids) that might be designed to help people with hearing impairment process sounds in complex acoustic environments. In auditory virtual reality (AVR) scenarios, there is a computational cost associated with processing each sound source. If an AVR system only has to process a few sound sources to mimic normal hearing, it would be a lot less expensive than if the system has to process many sound sources.  (Supported by grants from National Institutes of Health, NIDCD and Oculus VR, LLC)

Share This