Why is it easier to understand people we know?

Emma Holmes – emma.holmes@ucl.ac.uk
X (Twitter): @Emma_Holmes_90

University College London (UCL), Department of Speech Hearing and Phonetic Sciences, London, Greater London, WC1N 1PF, United Kingdom

Popular version of 4aPP4 – How does voice familiarity affect speech intelligibility?
Presented at the 186th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0027437

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

It’s much easier to understand what others are saying if you’re listening to a close friend or family member, compared to a stranger. If you practice listening to the voices of people you’ve never met before, you might also become better at understanding them too.

Many people struggle to understand what others are saying in noisy restaurants or cafés. This can become much more challenging as people get older. It’s often one of the first changes that people notice in their hearing. Yet, research shows that these situations are much easier if people are listening to someone they know very well.

In our research, we ask people to visit the lab with a friend or partner. We record their voices while they read sentences aloud. We then invite the volunteers back for a listening test. During the test, they hear sentences and click words on a screen to show what they heard. This is made more difficult by playing a second sentence at the same time, which the volunteers are told to ignore. This is like having a conversation when there are other people talking around you. Our volunteers listen to many sentences over the course of the experiment. Sometimes, the sentence is one recorded from their friend or partner. Other times, it’s one recorded from someone they’ve never met. Our studies have shown that people are best at understanding the sentences spoken by their friend or partner.

In one study, we manipulated the sentence recordings, to change the sound of the voices. The voices still sounded natural. Yet, volunteers could no longer recognize them as their friend or partner. We found that participants were still better at understanding the sentences, even though they didn’t recognize the voice.

In other studies, we’ve investigated how people learn to become familiar with new voices. Each volunteer learns the names of three new people. They’ve never met these people, but we play them lots of recordings of their voices. This is like when you listen to a new podcast or radio show. We’ve found that people become very good at understanding these people. In other words, we can train people to become familiar with new voices.

In new work that hasn’t yet been published, we found that voice familiarization training benefits both older and younger people. So, it may help older people who find it very difficult to listen in noisy places. Many environments contain background noise—from office parties to hospitals and train stations. Ultimately, we hope that we can familiarize people with voices they hear in their daily lives, to make it easier to listen in noisy places.

World Hearing Day 2024

The Acoustical Society of America (ASA) takes pride in its mission to generate, disseminate, and promote the knowledge and practical applications of acoustics. This also aligns with one of the objectives of World Hearing Day 2024; to reshape public perceptions surrounding ear and hearing based on accurate, evidence-based information. In support of World Hearing Day[i], we would like to draw attention to a couple Special Issues of the Journal of the Acoustical Society of America (JASA) that delve into the clinical and investigational facets of noise-induced hearing disorders.

Noise-Induced Hearing Disorders: Clinical and Investigational Tools
Guest Editors: Colleen G. Le Prell (Liaison Guest Editor), Odile H. Clavier, and Jianxin Bao

This special issue provides valuable insights into cutting-edge clinical and investigational tools designed to sensitively detect noise injury in the cochlea. Emphasizing the importance of sound exposure monitoring and protection, the collection explores tools available for characterizing individual noise hazards and attenuation. Throughout, there is a concentrated focus on the suitability of diverse functional measures for hearing and balance-related clinical trials, including considerations for boothless auditory test technology in decentralized clinical trials. Furthermore, the issue offers guidance on designing clinical trials to prevent noise-induced hearing deficits such as hearing loss and tinnitus.

World Hearing Day JASA special issue

Issue Highlights

Noise-Induced Hearing Loss: Translating Risk from Animal Models to Real-World Environments
Guest Editors: Colleen G. Le Prell, CAPT William J. Murphy, Tanisha L. Hammill, and J. R. Stefanson

Noise-induced hearing loss (NIHL) stands as a common injury for service members and civilian workers exposed to noise. This special issue focuses on translating knowledge from animal models to real-world environments. Contributors delve into the cellular and molecular events in the inner ear post-noise exposure, exploring potential pharmaceutical prevention of NIHL. The collection includes insights into methods and models used during preclinical assessments of investigational new drug agents, as well as information about human populations at risk for NIHL.

World Hearing Day JASA Special Issue

Issue Highlights

Together, these special issues provide an exploration of noise-induced hearing disorders, offering valuable insights and potential solutions for both clinical and real-world settings. Be sure the share this post to make ear and hearing care a reality for all! For more information about World Hearing Day 2024, visit https://www.who.int/campaigns/world-hearing-day/2024.

[i] Due to an unexpected site wide issue, the posting of this content was unfortunately delayed to after March 3, 2024.

4aSC12 – When it comes to recognizing speech, being in noise is like being old

Kristin Van Engen – kvanengen@wustl.edu
Avanti Dey
Nichole Runge
Mitchell Sommers
Brent Spehar
Jonathen E. Peelle

Washington University in St. Louis
1 Brookings Drive
St. Louis, MO 63130

Popular version of paper 4aSC12
Presented Thursday morning, May 10, 2018
175th ASA Meeting, Minneapolis, MN

How hard is it to recognize a spoken word?

Well, that depends. Are you old or young? How is your hearing? Are you at home or in a noisy restaurant? Is the word one that is used often, or one that is relatively uncommon? Does it sound similar to lots of other words in the language?

As people age, understanding speech becomes more challenging, especially in noisy situations like parties or restaurants. This is perhaps unsurprising, given the large proportion of older adults who have some degree of hearing loss. However, hearing measurements do not actually do a very good job of predicting the difficulty a person will have with speech recognition, and older adults tend to do worse than younger adults even when their hearing is good.

We also know that some words are more difficult to recognize than others. Words that are used rarely are more difficult than common words, and words that sound similar to many other words in the language are recognized less accurately than unique-sounding words. Relatively little is known, however, about how these kinds of challenges interact with background noise to affect the process of word recognition or how such effects might change across the lifespan.

In this study, we used eye tracking to investigate how noise and word frequency affect the process of understanding spoken words. Listeners were shown a computer screen displaying four images, and listened the instruction “Click on the” followed by a target word (e.g., “Click on the dog.”). As the speech signal unfolds, the eye tracker records the moment-by-moment direction of the person’s gaze (60 times per second). Since listeners direct their gaze toward the visual information that matches incoming auditory information, this allows us to observe the process of word recognition in real time.

Our results indicate that word recognition is slower in noise than in quiet, slower for low-frequency words than high-frequency words, and slower for older adults than younger adults. Interestingly, young adults were more slowed down by noise than older adults. The main difference, however, was that young adults were considerably faster to recognize words in quiet conditions. That is, word recognition by older adults didn’t differ much from quiet to noisy conditions, but young listeners looked like older listeners when tasked with listening to speech in noise.

1pSC26 – Acoustics and Perception of Charisma in Bilingual English-Spanish

Rosario Signorello – rsignorello@ucla.edu
Department of Head and Neck Surgery
31-20 Rehab Center,
Los Angeles, CA 90095-1794
Phone: +1 (323) 703-9549

Popular version of paper 1pSC26 “Acoustics and Perception of Charisma in Bilingual English-Spanish 2016 United States Presidential Election Candidates”
Presented at the 171st Meeting on Monday May 23, 1:00 pm – 5:00 pm, Salon F, Salt Lake Marriott Downtown at City Creek Hotel, Salt Lake City, Utah,

Charisma is the set of leadership characteristics, such as vision, emotions, and dominance used by leaders to share beliefs, persuade listeners and achieve goals. Politicians use voice to convey charisma and appeal to voters to gain social positions of power. “Charismatic voice” refers to the ensemble of vocal acoustic patterns used by speakers to convey personality traits and arouse specific emotional states in listeners. The ability to manipulate charismatic voice results from speakers’ universal and learned strategies to use specific vocal parameters (such as vocal pitch, loudness, phonation types, pauses, pitch contours, etc.) to convey their biological features and their social image (see Ohala, 1994; Signorello, 2014a, 2014b; Puts et al., 2006). Listeners’ perception of the physical, psychological and social characteristics of the leader is influenced by universal ways to emotionally respond to vocalizations (see Ohala, 1994; Signorello, 2014a, 2014b) combined with specific, culturally-mediated, habits to manifest emotional response in public (Matsumoto, 1990; Signorello, 2014a).

Politicians manipulate vocal acoustic patterns (adapting them to the culture, language, social status, educational background and the gender of the voters) to convey specific types of leadership fulfilling everyone’s expectation of what charisma is. But what happen to leaders’ voice when they use different languages to address voters? This study investigates speeches of bilingual politicians to find out the vocal acoustic differences of leaders speaking in different languages. It also investigates how the acoustical differences in different languages can influence listeners’ perception of type of leadership and the emotional state aroused by leaders’ voices.

We selected vocal samples from two bilingual America-English/American-Spanish politicians that participated to the 2016 United States presidential primaries: Jeb Bush and Marco Rubio. We chose words with similar vocal characteristics in terms of average vocal pitch, vocal pitch range, and loudness range. We asked listeners to rate the type of charismatic leadership perceived and to assess the emotional states aroused by those voices. We finally asked participants how the different vocal patterns would affect their voting preference.

Preliminary statistical analyses show that English words like “terrorism” (voice sample 1) and “security” (voice sample 2), characterized by mid vocal pitch frequencies, wide vocal pitch ranges, and wide loudness ranges, convey an intimidating, arrogant, selfish, aggressive, witty, overbearing, lazy, dishonest, and dull type of charismatic leadership. Listeners from different language and cultural backgrounds also reported these vocal stimuli triggered emotional states like contempt, annoyance, discomfort, irritation, anxiety, anger, boredom, disappointment, and disgust. The listeners who were interviewed considered themselves politically liberal and they responded that they would probably vote for a politician with the vocal characteristics listed above.

Speaker Jeb Bush. Mid vocal pitch frequencies (126 Hz), wide vocal pitch ranges (97 Hz), and wide loudness ranges (35 dB)

Speaker Marco Rubio. Mid vocal pitch frequencies 178 Hz), wide vocal pitch ranges (127 Hz), and wide loudness ranges (30 dB)

Results also show that Spanish words like “terrorismo” (voice sample 3) and “ilegal” (voice sample 4) characterized by an average of mid-low vocal pitch frequencies, mid vocal pitch ranges, and narrow loudness ranges convey a personable, relatable, kind, caring, humble, enthusiastic, witty, stubborn, extroverted, understanding, but also weak and insecure type of charismatic. Listeners from different language and cultural backgrounds also reported these vocal stimuli triggered emotional states like happiness, amusement, relief, and enjoyment. The listeners who were interviewed considered themselves politically liberal and they responded that they would probably vote for a politician with the vocal characteristics listed above.  

Speaker Jeb Bush. Mid-low vocal pitch frequencies (95 Hz), mid vocal pitch ranges (40 Hz), and narrow loudness ranges (17 dB) 

Speaker Marco Rubio. Mid vocal pitch frequencies 146 Hz), wide vocal pitch ranges (75 Hz), and wide loudness ranges (25 dB)

Voice is a very dynamic non-verbal behavior used by politicians to persuade the audience and manipulate voting preference. The results of this study show how acoustic differences in voice convey different types of leadership and arouse differently the emotional states of the listeners. The voice samples studied show how speakers Jeb Bush and Marco Rubio adapt their vocal delivery to audiences of different backgrounds. The two politicians voluntary manipulate their voice parameters while speaking in order to appear as they were endowed of different leadership qualities. The vocal pattern used in English conveys the threatening and dark side of their charisma, inducing the arousal of negative emotions, which triggers a positive voting preference in listeners. The vocal pattern used in English conveys the charming and caring side of their charisma, inducing the arousal of positive emotions, which triggers a negative voting preference in listeners.

The manipulation of voice arouses emotional states that will induce voters to consider a certain type of leadership as more appealing. Experiencing emotions help voters to assess the effectiveness of a political leader. If the emotional arousing matches with voters’ expectation of how a charismatic leader should make them feel then voters would help the charismatic speaker to became their leader.

Signorello, R. (2014a). Rosario Signorello (2014). La Voix Charismatique : Aspects Psychologiques et Caractéristiques Acoustiques. PhD Thesis. Université de Grenoble, Grenoble, France and Università degli Studi Roma Tre, Rome, Italy.

Signorello, R. (2014b). The biological function of fundamental frequency in leaders’ charismatic voices. The Journal of the Acoustical Society of America 136 (4), 2295-2295.

Ohala, J. (1984). An ethological perspective on common cross-language utilization of F0 of voice. Phonetica, 41(1):1–16.

Puts, D. A., Hodges, C. R., Cárdenas, R. A. et Gaulin, S. J. C. (2007). Men’s voices as dominance signals : vocal fundamental and formant frequencies influence dominance attributions among men. Evolution and Human Behavior, 28(5):340–344.

2aMU4 – Yelling vs. Screaming in Operatic and Rock Singing

Lisa Popeil – lisa@popeil.com
14431 Ventura Blvd #200
Sherman Oaks, CA 91423

Popular version of paper 2aMU4
Presented Tuesday morning, May 24, 2016

There exist a number of ways the human vocal folds can vibrate which create unique sounds used in singing.  The two most common vibrational patterns of the vocal folds are commonly called “chest voice” and “head voice”, with chest voice sounding like speaking or yelling and head voice sounding more flute-like or like screaming on high pitches.  In the operatic singing tradition, men sing primarily in chest voice while women sing primarily in their head voice.  However, in rock singing, men often emit high screams using their head voice while female rock singers use almost exclusively their chest voice for high notes.

Vocal fold vibrational pattern differences are only a part of the story though, since the shaping of the throat, mouth and nose (the vocal tract) play a large part in the perception of the final sound.  That means that head voice can be made to “sound” like chest voice on high screams using vocal tract shaping and only the most experienced listener can determine if the vocal register used was chest or head voice.

Using spectrographic analysis, differences and similarities between operatic and rock singers can be seen.  One similarity between the two is the heightened output of a resonance commonly called “ring”.  This resonance, when amplified by vocal tract shaping, creates a piercing sound that’s perceived by the listener as extremely loud. The amplified ring harmonics can be seen in the 3,000 Hz band in both the male opera sample and in rock singing samples:

MALE OPERA – HIGH B (B4…494 Hz) CHEST VOICEPopeil1  Check Voice SingingFigure 1 MALE ROCK – HIGH E (E5…659 Hz) CHEST VOICEPopeil 2 Chest voice singingFigure 2 MALE ROCK – HIGH G (G5…784 Hz)    HEAD VOICEPopeil 3 Head voice singingFigure 3

Though each of these three male singers exhibit a unique frequency signature and whether singing in chest or head voice, each singer is using the amplified ring strategy in the 3,000Hz range amplify their sound and create excitement.