4aSC12 – When it comes to recognizing speech, being in noise is like being old

Kristin Van Engen – kvanengen@wustl.edu
Avanti Dey
Nichole Runge
Mitchell Sommers
Brent Spehar
Jonathen E. Peelle

Washington University in St. Louis
1 Brookings Drive
St. Louis, MO 63130

Popular version of paper 4aSC12
Presented Thursday morning, May 10, 2018
175th ASA Meeting, Minneapolis, MN

How hard is it to recognize a spoken word?

Well, that depends. Are you old or young? How is your hearing? Are you at home or in a noisy restaurant? Is the word one that is used often, or one that is relatively uncommon? Does it sound similar to lots of other words in the language?

As people age, understanding speech becomes more challenging, especially in noisy situations like parties or restaurants. This is perhaps unsurprising, given the large proportion of older adults who have some degree of hearing loss. However, hearing measurements do not actually do a very good job of predicting the difficulty a person will have with speech recognition, and older adults tend to do worse than younger adults even when their hearing is good.

We also know that some words are more difficult to recognize than others. Words that are used rarely are more difficult than common words, and words that sound similar to many other words in the language are recognized less accurately than unique-sounding words. Relatively little is known, however, about how these kinds of challenges interact with background noise to affect the process of word recognition or how such effects might change across the lifespan.

In this study, we used eye tracking to investigate how noise and word frequency affect the process of understanding spoken words. Listeners were shown a computer screen displaying four images, and listened the instruction “Click on the” followed by a target word (e.g., “Click on the dog.”). As the speech signal unfolds, the eye tracker records the moment-by-moment direction of the person’s gaze (60 times per second). Since listeners direct their gaze toward the visual information that matches incoming auditory information, this allows us to observe the process of word recognition in real time.

Our results indicate that word recognition is slower in noise than in quiet, slower for low-frequency words than high-frequency words, and slower for older adults than younger adults. Interestingly, young adults were more slowed down by noise than older adults. The main difference, however, was that young adults were considerably faster to recognize words in quiet conditions. That is, word recognition by older adults didn’t differ much from quiet to noisy conditions, but young listeners looked like older listeners when tasked with listening to speech in noise.

2pAAa10 – Turn around when you’re talking to me!

Jennifer Whiting – jkwhiting@physics.byu.edu
Timothy Leishman, PhD – tim_leishman@physics.byu.edu
K.J. Bodon – joshuabodon@gmail.com

Brigham Young University
N283 Eyring Science Center
Provo, UT 84602

Popular version of paper 2pAAa10, “High-resolution measurements of speech directivity”
Presented Tuesday afternoon, November 3, 2015, 4:40 PM, Grand Ballroom 3
170th ASA Meeting, Jacksonville

In general, most sources of sound do not radiate equally in all directions. The human voice is no exception to this rule. How strongly sound is radiated in a given direction at a specific frequency, or pitch, is called directivity. While many [references] have studied the directivity of speaking and singing voices, some important details are missing. The research reported in this presentation measured directivity of live speech at higher angular and frequency resolutions than have been previously measured, in an effort to capture the missing details.

Measurement methods
The approach uses a semicircular array of 37 microphones spaced with five-degree polar-angle increments, see Figure 1. A subject sits on a computer-controlled rotating chair with his or her mouth aligned at the axis of rotation and circular center of the microphone array. He or she repeats a series of phonetically-balanced sentences at each of 72 five-degree azimuthal-angle increments. This results in 2522 measurement points on a sphere around the subject.

[MISSING Figure 1. A subject and the measurement array]

The measurements are based on audio recordings of the subject who tries to repeat the sentences with exactly the same timing and inflection at each rotation. To account for the inevitable differences in each repetition, a transfer function and the coherence between a reference microphone near the subject and a measurement microphone on the semicircular array is computed. The coherence is used to examine how good each measurement is. The transfer function for each measurement point makes up the directivity. To visualize the results, each measurement is plotted on a sphere, where the color and the radius of the sphere indicate how strongly sound is radiated in that direction for a given frequency. Animations of these spherical plots show how the directivity differs for each frequency.

[MISSING Figure 2. Balloon plot for male speech directivity at 500 and 1000 Hz.]
[MISSING Figure 3. Balloon plot for female speech directivity at 500 and 1000 Hz.]
[MISSING Animation 1. Male Speech Directivity, animated]
[MISSING Animation 2. Female Speech Directivity, animated]

Results and Conclusions
Some unique results are visible in the animations. Most importantly, as frequency increases, one can see that most of the sound is radiated in the forward direction. This is one reason for why it’s hard to hear someone talking in the front of a car when you’re sitting in the back, unless they turn around to talk to you. One can also see in the animations that as frequency increases, and most of the sound radiates forwards, there is poor coherence in the back area. This doesn’t necessarily indicate a poor measurement, just poor signal-to-noise ratio, since there is little sound energy in that direction. It’s also interesting to see that the polar angle of the strongest radiation also changes with frequency. At some frequencies the sound is radiated strongly downward and to the sides, but at other frequencies the stound is radiated strongly upwards and forwards. Male and female directivities are similar in shape, but at different frequencies, since the fundamental frequency of males and females is so different.

A more complete understanding of speech directivity has great benefits to several industries. For example, hearing aid companies can use speech directivity patterns to know where to aim microphones in the hearing aids to pick up the best sound for the hearing aid wearer having a conversation. Microphone placement in cell phones can be adjusted to get clearer signal from those talking into the cell phone. The theater and audio industries can use directivity patterns to assist in positioning actors on stage, or placing microphones near the speakers to record the most spectrally rich speech. The scientific community can develop more complete models for human speech based on these measurements. Further study on this subject will allow researchers to improve the measurement method and analysis techniques to more fully understand the results, and generalize them to all speech containing similar phonemes to those in these measurements.