Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount?
Eric M. Johnson – firstname.lastname@example.org
Sarah Hargus Ferguson – email@example.com
Department of Communication Sciences and Disorders
University of Utah
390 South 1530 East, Room 1201
Salt Lake City, UT 84112
Popular version of poster 3pSC10, “Gender and rate effects on speech intelligibility.”
Presented Wednesday afternoon, May 25, 2016, 1:00, Salon G
171st ASA Meeting, Salt Lake City
Older adults seeking hearing help often report having an especially hard time understanding women’s voices. However, this anecdotal observation doesn’t always agree with the findings from scientific studies. For example, Ferguson (2012) found that male and female talkers were equally intelligible for older adults with hearing loss. Moreover, several studies have found that young people with normal hearing actually understand women’s voices better than men’s voices (e.g. Bradlow et al., 1996; Ferguson, 2004). In contrast, Larsby et al. (2015) found that, when listening in background noise, groups of listeners with and without hearing loss were better at understanding a man’s voice than a woman’s voice. The Larsby et al. data suggest that female speech might be more affected by distortion like background noise than male speech is, which could explain why women’s voices may be harder to understand for some people.
We were interested to see if another type of distortion, speeding up the speech, would have an equal effect on the intelligibility of men and women. Speech that has been sped up (or time-compressed) has been shown to be less intelligible than unprocessed speech (e.g. Gordon-Salant & Friedman, 2011), but no studies have explored whether time compression causes an equal loss of intelligibility for male and female talkers. If an increase in playback speed causes women’s speech to be less intelligible than men’s, it could reveal another possible reason why so many older adults with hearing loss report difficulty understanding women’s voices. To this end, our study tested whether the intelligibility of time-compressed speech decreases for female talkers more than it does for male talkers.
Using 32 listeners with normal hearing, we measured how much the intelligibility of two men and two women went down when the playback speed of their speech was increased by 50%. These four talkers were selected based on their nearly equivalent conversational speaking rates. We used digital recordings of each talker and made two different versions of each sentence they spoke: a normal-speed version and a fast version. The software we used allowed us to speed up the recordings without making them sound high-pitched.
Audio sample 1: A sentence at its original speed.
Audio sample 2: The same sentence sped up to 50% faster than its original speed.
All of the sentences were presented to the listeners in background noise. We found that the men and women were essentially equally intelligible when listeners heard the sentences at their original speed. Speeding up the sentences made all of the talkers harder to understand, but the effect was much greater for the female talkers than the male talkers. In other words, there was a significant interaction between talker gender and playback speed. The results suggest that time-compression has a greater negative effect on the intelligibility of female speech than it does on male speech.
Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.
Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.
These results confirm the negative effects of time-compression on speech intelligibility and imply that audiologists should counsel the communication partners of their patients to avoid speaking excessively fast, especially if the patient complains of difficulty understanding women’s voices. This counsel may be even more important for the communication partners of patients who experience particular difficulty understanding speech in noise.
Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20, 255-272.
Ferguson, S. H. (2004). “Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365-2373.
Ferguson, S. H. (2012). “Talker differences in clear and conversational speech: Vowel intelligibility for older adults with hearing loss,” J. Speech Lang. Hear. Res. 55, 779-790.
Gordon-Salant, S., and Friedman, S. A. (2011). “Recognition of rapid speech by blind and sighted older adults,” J. Speech Lang. Hear. Res. 54, 622-631.
Larsby, B., Hällgren, M., Nilsson, L., and McAllister, A. (2015). “The influence of female versus male speakers’ voice on speech recognition thresholds in noise: Effects of low-and high-frequency hearing impairment,” Speech Lang. Hear. 18, 83-90.
Popular version of paper 1pAA6, “Listening for solutions to a speech intelligibility problem”
Presented Monday afternoon, May 23, 2016, 2:45 in Salon E
171st ASA Meeting in Salt Lake City, UT
Loudspeakers for sound reinforcement systems are designed to project their sound in specific directions. Sound system designers take advantage of the “directivity” characteristics of these loudspeakers, aiming their sound uniformly throughout seating areas, while avoiding walls and ceilings and other surfaces from which undesirable reflections could reduce clarity and fidelity.
Many high-quality sound reinforcement loudspeaker systems incorporate horn loudspeakers that provide very good control, but these are relatively large and conspicuous. In recent years, “steerable column arrays” have become available, which are tall but narrow, allowing them to better blend into the architectural design. These are well suited to the frequency range of speech, and to some degree their sound output can be steered up or down using electronic signal processing.
Figure 1. steerable column arrays
Figure 1 illustrates the steering technique, with six individual loudspeakers in a vertical array. Each loudspeaker generates an ever-expanding sphere of sound (in this figure, simplified to show only the horizontal diameter of each sphere), propagating outward at the speed of sound, which is roughly 1 foot per millisecond. In the “not steered” column, all of the loudspeakers are outputting their sound at the same time, with a combined wavefront spreading horizontally, as an ever-expanding cylinder of sound. In the “steered downward” column, the electronic signal to each successively lower loudspeaker is slightly delayed; the top loudspeaker outputs its sound first, while each lower loudspeaker in turn outputs its sound just a little later, so that the sound energy is generally steered slightly downward. This steering allows for some flexibility in positioning the loudspeaker column. However, these systems only offer some vertical control; left-to-right projection is not well controlled.
Steerable column arrays have reasonably resolved speech reinforcement issues in many large, acoustically-problematic spaces. Such arrays were appropriate selections for a large worship space, with a balcony and a huge dome, that had undergone a comprehensive renovation. Unfortunately, in this case, problems with speech intelligibility persisted, even after multiple adjustments by reputable technicians, who had used their instrumentation to identify several sidewall surfaces that appeared to be reflecting sound and causing problematic echoes. They recommended additional sound absorptive treatment that could adversely affect visual aesthetics and negatively impact the popular classical music concerts.
Upon visiting the space as requested to investigate potential acoustical treatments, speech was difficult to understand in various areas on the main floor. While playing a click track (imagine a “pop” every 5 seconds) through the sound system, and listening to the results around the main floor, we heard strong echoes emanating from the direction of the surfaces that had been recommended for sound-absorptive treatment.
Nearby those surfaces, additional column loudspeakers had been installed to augment coverage of the balcony seating area. These balcony loudspeakers were time-delayed (in accordance with common practice, to accommodate the speed of sound) so that they would not produce their sound until the sound from the main loudspeakers had arrived at the balcony. With proper time delay, listeners on the balcony would hear sound from both main and balcony loudspeakers at approximately the same time, and thereby avoid what would otherwise seem like an echo from the main loudspeakers.
With more listening, it became clear that the echo was not due to reflections from the walls at all, but rather from the delayed balcony loudspeakers’ sound inadvertently spraying back to the main seating area. These loudspeakers cannot be steered in a multifaceted manner that would both cover the balcony and avoid the main floor.
We simply turned off the balcony loudspeakers, and the echo disappeared. More importantly, speech intelligibility improved significantly throughout the main floor. Intelligibility throughout the balcony remained acceptable, albeit not quite as good as with the balcony loudspeakers operating.
The general plan is to remove the balcony loudspeakers and relocate them to the same wall as the main loudspeakers, but steer them to cover the balcony.
Adding sound-absorptive treatment on the side walls would not have solved the problem, and would have squandered funds while impacting the visual aesthetics and classical music programming. Listening for solutions proved to be more effective than interpreting test results from sophisticated instrumentation.
Speech: An eye and ear affair!
Pamela Trudeau-Fisette – firstname.lastname@example.org
Lucie Ménard – email@example.com
Université du Quebec à Montréal
320 Ste-Catherine E.
Montréal, H3C 3P8
Popular version of poster session 2aSC, “Auditory feedback perturbation of vowel production: A comparative study of congenitally blind speakers and sighted speakers”
Presented Tuesday morning, May 19, 2015, Ballroom 2, 8:00 AM – 12:00 noon
169th ASA Meeting, Pittsburgh
When learning to speak, young infants and toddlers use auditory and visual cues to correctly associate speech movements to a specific speech sound. In doing so, typically developing children compare their own speech and those of their ambient language to build and improve the relationship between what they hear, see and feel, and how to produce it.
In many day-to-day situations, we exploit the multimodal nature of speech: in noisy environments, for instance like in a cocktail party, we look at our interlocutor’s face and use lip reading to recover speech sounds. When speaking clearly, we open our mouth wider to make ourself sound more intelligible. Sometimes, just seeing someone’s face is enough to communicate!
What happens in cases of congenital blindness? Despite the fact that blind speakers learn to produce intelligible speech, they do not quite speak like sighted speakers do. Since they do not perceive others’ visual cues, blind speakers do not produce visible labial movements as much as their sighted peers do.
Production of the French vowel “ou” (similar as in cool) produced by a sighted adult speaker (on the left) and a congenitally blind adult speaker (on the right). We can clearly see that the articulatory movements of the lips are more explicit for the sighted speaker.
Therefore, blind speakers put more weight on what they hear (auditory feedback) than sighted speakers, because one sensory input is lacking. How does that affect the way blind individuals speak?
To answer this question, we conducted an experiment during which we asked congenitally blind adult speakers and sighted adult speakers to produce multiple repetitions of the French vowel “eu”. While they were producing the 130 utterances, we gradually altered their auditory feedback through headphones – without them knowing it- so that they were not hearing the exact sound they were saying. Consequently, they needed to modify the way they produced the vowel in order to compensate for the acoustic manipulation, so they could hear the vowel they were asked to produce (and the one they thought they were saying all along!).
What we were interested in is whether blind speakers and sighted speakers would react differently to this auditory manipulation. The blind speakers not being able to rely on visual feedback, we hypothesized that they would grant more importance on their auditory feedback and, therefore, compensate to a greater extent for the acoustic manipulation.
To explore this matter, we observed the acoustic (produced sounds) and articulatory (lips and tongue movements) differences between the two groups at three distinct time points of the experiment phases.
As predicted, congenitally blind speakers compensated for the altered auditory feedback in a greater extent than their sighted peers. More specifically, even though both speaker groups adapted their productions, the blind group compensated more than the control group did, as if they were integrating the auditory information more strongly. Also, we found that both speaker groups used different articulatory strategies to respond to the applied manipulation: blind participants used their tongue (which is not visible when you speak) more to compensate. This latter observation is not surprising considering the fact that blind speakers do not use their lips (which is visible when you speak) as much as their sighted peers do.
Understanding conversation in noisy everyday situations can be a challenge for listeners, especially individuals who are older and/or hard-of-hearing. Listening in some everyday situations (e.g., at dinner parties) can be so challenging that people might even decide that they would rather stay home than go out. Eventually, avoiding these situations can damage relationships with family and friends and reduce enjoyment of and participation in activities. What are the reasons for these difficulties and why are some people affected more than other people?
How easy or challenging it is to listen may vary from person to person because some people have better hearing abilities and/or cognitive abilities compared to other people. The hearing abilities of some people may be affected by the degree or type of their hearing loss. The cognitive abilities of some people, for example how well they can attend to and remember what they have heard, can also affect how easy it is for them to follow conversation in challenging listening situations. In addition to hearing abilities, cognitive abilities seem to be particularly relevant because in many everyday listening situations people need to listen to more than one person talking at the same time and/or they may need to listen while doing something else such as driving a car or crossing a busy street. The auditory demands that a listener faces in a situation increase as background noise becomes louder or as more interfering sounds combine with each other. The cognitive demands in a situation increase when listeners need to keep track of more people talking or to divide their attention as they try to do more tasks at the same time. Both auditory and cognitive demands could result in the situation becoming very challenging and these demands may even totally overload a listener.
One way to measure information overload is to see how much a person remembers after they have completed a set of tasks. For several decades, cognitive psychologists have been interested in ‘working memory’, or a person’s limited capacity to process information while doing tasks and to remember information after the tasks have been completed. Like a bank account, the more cognitive capacity is spent on processing information while doing tasks, the less cognitive capacity will remain available for remembering and using the information later. Importantly, some people have bigger working memories than other people and people who have a bigger working memory are usually better at understanding written and spoken language. Indeed, many researchers have measured working memory span for reading (i.e., a task involving the processing and recall of visual information) to minimize ‘contamination’ from the effects of hearing loss that might be a problem if they measured working memory span for listening. However, variations in difficulty due to hearing loss may be critically important in assessing how the demands of listening affect different individuals when they are trying to understand speech in noise. Some researchers have studied the effects of the acoustical properties of speech and interfering noises on listening, but less is known about how variations in the type of language materials (words, sentences, stories) might alter listening demands for people who have hearing loss. Therefore, to learn more about why some people cope better when listening to conversation in noise, we need to discover how both their auditory and their cognitive abilities come into play during everyday listening for a range of spoken materials.
We predicted that speech understanding would be more highly associated with working memory span for listening than with listening span for reading, especially when more realistic language materials are used to measure speech understanding. To test these predictions, we conducted listening and reading tests of working memory and we also measured memory abilities using five other measures (three auditory memory tests and two visual memory tests). Speech understanding was measured with six tests (two tests with words, one in quiet and one in noise; three tests with sentences, one in quiet and two in noise; one test with stories in quiet). The tests of speech understanding using words and sentences were selected from typical clinical tests and involved simple immediate repetition of the words or sentences that were heard. The test using stories has been used in laboratory research and involved comprehension questions after the end of the story. Three groups with 24 people in each group were tested: one group of younger adults (mean age = 23.5 years) with normal hearing and two groups of older adults with hearing loss (one group with mean age = 66.3 years and the other group with mean age 74.3 years).
There was a wide range in performance on the listening test of working memory, but performance on the reading test of working memory was more limited and poorer. Overall, there was a significant correlation between the results on the reading and listening working memory measures. However, when correlations were conducted for each of the three groups separately, the correlation reached significance only for the oldest listeners with hearing loss; this group had lower mean scores on both tests. Surprisingly, for all three groups, there were no significant correlations among the working memory and speech understanding measures. To further investigate this surprising result, a factor analysis was conducted. The results of the factor analysis suggest that there was one factor including age, hearing test results and performance on speech understanding measures when the speech-understanding task was simply to repeat words or sentences – these seem to reflect auditory abilities. In addition, separate factors were found for performance on the speech understanding measures involving the comprehension of discourse or the use of semantic context in sentences – these seem to reflect linguistic abilities. Importantly, the majority of the memory measures were distinct from both kinds of speech understanding measures, and also a more basic and less cognitively demanding memory measure involving only the repetition of sets of numbers. Taken together, these findings suggest that working memory measures reflect differences between people in cognitive abilities that are distinct from those tapped by the sorts of simple measures of hearing and speech understanding that have been used in the clinic. Above and beyond current clinical tests, by testing working memory, especially listening working memory, useful information could be gained about why some people cope better than others in everyday challenging listening situations.
Presentation #1pSC2 “Effect of age, hearing loss, and linguistic complexity on listening effort as mentioned by working memory span” by Margaret K. Pichora-Fuller and Sherri L. Smith will be take place on Monday, May 18, 2015, at 1:55 PM in Kings 4 at the Wyndham Grand Pittsburgh Downtown Hotel. The abstract can be found by searching for the presentation number here: