The Impact of Formal Musical Training on Speech Comprehension in Heavily Distracting Environments

Alexandra Bruder – alexandra.l.bruder@vanderbilt.edu

Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States

Joseph Schlesinger – joseph.j.schlesinger@vumc.org
Twitter: @DrJazz615

Vanderbilt University Medical Center
Nashville, TN 37205
United States

Clayton D Rothwell – crothwell@infoscitex.com<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States

Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.

Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?

We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.

Image courtesy of Bruder et al. original paper. (Psychology of Music).

Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.

Image courtesy of Bruder et al. original paper (Psychology of Music).

Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.

Diverse Social Networks Reduce Accent Judgments

Diverse Social Networks Reduce Accent Judgments

Perception in context: How racialized identities impact speech perception

Media Contact:
Larry Frum
AIP Media
301-209-3090
media@aip.org

DENVER, May 24, 2022 – Everyone has an accent. But the intelligibility of speech doesn’t just depend on that accent; it also depends on the listener. Visual cues and the diversity of the listener’s social network can impact their ability to understand and transcribe sentences after listening to the spoken word.

Ethan Kutlu, of the University of Iowa, will discuss this social phenomenon in his presentation, “Perception in context: How racialized identities impact speech perception,” which will take place May 24 at 12:15 p.m. Eastern U.S. as part of the 182nd Meeting of the Acoustical Society of America at the Sheraton Denver Downtown Hotel.

Kutlu and his team paired American, British, and Indian varieties of English with images of white and South Asian faces. While the accents differed, they were all normalized to have the same base intelligibility. They played these voices for listeners from a low-diverse (Gainesville, Florida) and high-diverse environment (Montreal, Quebec).

“Racial and linguistic diversity in our social networks and in our surrounding environments impact how we engage in perceiving speech. Encountering new voices and accents that are different from our own improves our ability to attend to speech that varies from our own,” said Kutlu. “We all have accents and embracing this is not hurting our own or others’ speech perception. On the contrary, it helps all of us.”

A participant’s ability to transcribe sentences decreased and they rated voices as more accented whenever the speech was paired with a South Asian face – no matter the English variety of the spoken word. Indian English paired with white faces was judged as heavily accented when compared to British and American English.

However, these results varied greatly by the listener’s social network and geographic context. Montreal participants, residents of a dual language city, were overall more accurate when transcribing speech. They did not change their judgments based on the faces they saw on the screen.

———————– MORE MEETING INFORMATION ———————–
USEFUL LINKS
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eventpilotadmin.com/web/planner.php?id=ASASPRING22
Press Room: https://acoustics.org/world-wide-press-room/

WORLDWIDE PRESS ROOM
In the coming weeks, ASA’s Worldwide Press Room will be updated with additional tips on dozens of newsworthy stories and with lay language papers, which are 300 to 500 word summaries of presentations written by scientists for a general audience and accompanied by photos, audio and video. You can visit the site during the meeting at https://acoustics.org/world-wide-press-room/.

PRESS REGISTRATION
We will grant free registration to credentialed journalists and professional freelance journalists. If you are a reporter and would like to attend, contact AIP Media Services at media@aip.org. For urgent requests, staff at media@aip.org can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

4aSC3 – Talkers prepare their lips before audibly speaking – Is this the same thing as coarticulated speech?

Peter A. Krause – peter.krause066@csuci.edu

CSU Channel Islands
One University Dr.
Camarillo, CA 93012

Popular version of 4aSC3 – Understanding anticipatory speech postures: does coarticulation extend outside the acoustic utterance?
Presented 9:45 Thursday Morning, May 26, 2022
182nd ASA Meeting
Click here to read the abstract

A speech sound like /s/ not fixed. The sound at the beginning of “soon” is not identical to the sound at the beginning of “seen.” We call this contextual variability coarticulation.

Verbal recording of “soon” and “seen.” Listen closely to the subtle differences in the initial /s/ sound.

A spectrogram of the same recording of “soon” and “seen.” Note how the /s/ sounds have a slightly different distribution of intensity over the frequency range shown.

Some theoretical models explain coarticulation by assuming that talkers retrieve slightly different versions of their /s/ sound from memory, depending on the sound to follow. Others emphasize that articulatory actions overlap in time, rather than keeping to a regimented sequence: Talkers usually start rounding their lips for the /u/ (“oo”) sound in “soon” while still making the hissy sound of the /s/, instead of waiting for the voiced part of the vowel. (See picture below.) But even overlapping action accounts of coarticulation disagree on how “baked in” the coarticulation is. Does the dictionary in your head tell you that the word “soon” is produced with a rounded /s/? Or is there a more general, flexible process whereby if you know an /u/ sound is coming up, you round your lips if able?

A depiction of the same speaker’s lips when producing the /s/ sound in “seen” (top) and the /s/ sound in “soon” (bottom). Note that in the latter case, the lips are already rounded.

If the latter, it is reasonable to ask whether coarticulation only happens during the audible portions of speech. My work suggests that the answer is no! For example, I have shown that during word-reading tasks, talkers tend to pre-round their lips a bit if they have been led to believe that an upcoming (but not yet seen) word will include an /u/ sound. This effect goes away if the word is equally likely to have an /u/ sound or an /i/ (“ee”) sound. More recently, I have shown that talkers awaiting their turn in natural, bi-directional conversation anticipate their upcoming utterance with their lips, by shaping them in sound-specific ways. (At least, they do so when preparing very short phrases like “yeah” or “okay.” For longer phrases, this effect disappears, which remains an interesting mystery.) Nevertheless, talkers apparently “lean forward” into their speech actions some of the time. In my talk I will argue that much of what we call “coarticulation” may be a special case of a more general pattern relating speech planning to articulatory action. In fact, it may reflect processes generally at work in all human action planning.

 Plots of lip area taken from my recent study of bi-directional conversation. Plots trace backward in time from the moment at which audible speech began (Latency 0). “Labially constrained” utterances are those requiring shrunken-down lips, like those starting with /p/ or having an early /u/ sound. Note that for short phrases, lip areas are partially set several seconds before audible speech begins.

2aSC2 – Identifying race from speech

Yolanda Holt1 -holty@ecu.edu
Tessa Bent2 -tbent@indiana.edu

1East Carolina University    2 Indiana University
600 Moye Boulevard           2631 East Discovery Parkway
Greenville, NC 27834            Bloomington, IN 47408

Popular version of 2aSC2 – Socio-ethnic expectations of race in speech perception
Presented Tuesday morning May 24, 2022
182nd ASA Meeting
Click here to read the abstract

Did I really have you at Hello?! Listening to a person we don’t see, we make mental judgments about the speaker such as their age, presenting sex (man or woman), regional dialect and sometimes their race. At times, we can accurately categorize the person from hearing just a single word, like hello. We wanted to know if listeners from the same community could listen to single words and accurately categorize the race of a speaker better than listeners from far away. We also wanted to know if regional dialect differences between Black and white speakers would interfere with accurate race identification.

In this listening experiment people from North Carolina and Indiana heard single words produced by 24 Black and white talkers from two communities in North Carolina. Both Black and white people living in the western North Carolina community near the Blue Ridge mountains are participating in the sound change event Southern Vowel Shift.

It is thought the Southern Vowel Shift makes the vowels in the word pairs heed and hid sound alike; and the vowels in the word pairs heyd and head sound alike. It is also thought that many white Southern American English speakers produce the vowel in the word whod with rounded lips.

In the eastern community near the Atlantic coast of North Carolina the Southern American English speakers don’t produce the vowels in the word pair heed and hid alike and neither do the vowels in the word pair heyd and head sound alike.  In this community it is also expected that many white Americans produce the vowel in the word whod with rounded lips.

Black and white men and women listeners from Indiana and North Carolina heard recordings of the eastern and western talkers saying the words heed, hid, heyd, head, had, hood, whod, hid, howed, hoyd in random order a total of 480 times.

 

The North Carolina listeners, as expected, completed the race categorization task with greater accuracy than Indiana listeners. Both listener groups categorized east and west white and east Black with around 80% accuracy. West Black talkers, participating in Southern sound change event, were the most difficult to categorize. They were identified at just above 55% accuracy.

We interpret the results to suggest when a talker’s speech does not meet the listener’s expectation it is difficult for listeners to categorize the race of the speaker.

In this experiment the white talkers from both communities were expected to produce the vowel in whod in a manner similar to each other. In contrast the west Black talkers were expected to produce several vowels, heed, hid, heyd, and head similar to their west white peers and differently than the east Black talkers. We thought this difference would make it difficult for listeners to accurately categorize the race of the west Black talkers by their speech alone. The results suggest listener accuracy in race identification is decreased when the speech produced doesn’t meet the listener’s mental expectations of what a talker should sound like.

Answer key to sound file (bb bb bb ww ww ww bb bb bb ww ww ww)

3aSC4 – Effects of two-talker child speech on novel word learning in preschool-age children

Tina M. Grieco-Calub, tina_griecocalub@rush.edu
Rush University Medical Center
Rush NeuroBehavioral Center

Popular version of 3aSC4 – Effects of two-talker child speech on novel word learning in preschool-age children
Presented Wednesday morning, May 25, 2022
182nd ASA Meeting in Denver, Colorado
Click here to read the abstract

One of the most important tasks for children during preschool and kindergarten is building vocabulary knowledge. This vocabulary is the foundation upon which later academic knowledge and reading skills are built. Children acquire new words through exposure to speech by other people including their parents, teachers, and friends. However, this exposure does not occur in a vacuum. Rather, these interactions often occur in situations where there are other competing sounds, including other people talking or environmental noise. Think back to a time when you tried to have a conversation with someone in a busy restaurant with multiple other conversations happening around you. It can be difficult to focus on the conversation of interest and ignore the other conversations in noisy settings.

Now, think about how a preschool- or kindergarten-aged child might navigate a similar situation, such as a noisy classroom. This child has less mature language and cognitive skills compared to you. Therefore, they have a harder time ignoring those irrelevant conversations to process what the teacher says. Also, children in classrooms must hear and understand the words they know and learn new words. Children who have a hard time ignoring the background noise can have a particularly hard time building essential vocabulary knowledge in classroom settings.

In this study, we are testing the extent to which background speech like what might occur in a preschool classroom influences word learning in preschool- and kindergarten-aged children. We are testing children’s ability to learn and remember unfamiliar words either in quiet and in a noise condition when two other children are talking in the background. In the noise condition, the volume of the teacher is slightly louder than the background talkers, like what a child would experience in a classroom. During the word learning task, children are first shown unfamiliar objects and are asked to repeat their names (e.g., This is a topin. You say topin; see attached movie clip). Children then receive training on the objects and their names. After training, children are asked to name each object. Children’s performance is quantified by how close their production of the object’s name is to the actual name. For example, a child might call the “topin” a “dobin”. Preliminary results suggest that children in quiet and in noise are fairly accurate at repeating the unfamiliar words:    they can focus on the teacher’s speech and repeat all the sounds of the word immediately regardless of condition. Children can also learn the words in both quiet and noise. However, children’s spoken productions of the words are less accurate when they are trained in noise than in quiet. These findings tentatively suggest that when there is background noise, children need more training to learn the precise sounds of words. We will be addressing this issue in future iterations of this study.

Stuttering Starts at Speech Initiation, Not Due to Impaired Motor Skills

Stuttering Starts at Speech Initiation, Not Due to Impaired Motor Skills

Theory suggests anomalies in the brain’s initiation circuit cause stuttering

Media Contact:
Larry Frum
AIP Media
301-209-3090
media@aip.org

SEATTLE, November 30, 2021 — About one in 20 people go through a period of stuttering during childhood. Until the latter half of the 20th century, stuttering was believed to be a psychological problem stemming from lack of effort or from trauma.

However, techniques in neuroimaging are leading to a much better understanding of brain function during speech and how stuttering arises. Frank Guenther, from Boston University, will present his findings on the origins of stuttering at the 181st Meeting of the Acoustical Society of America, which runs from Nov. 29 to Dec. 3, at the Hyatt Regency Seattle. The talk, “A neurocomputational view of developmental stuttering,” will take place Tuesday, Nov. 30 at 2:15 p.m. Eastern U.S.

Guenther compares speech to a jukebox that plays CDs. The jukebox has two circuits: one that chooses a CD and one that plays the CD.

Inside the brain, this corresponds to one circuit initiating the desired speech in the basal ganglia, while another circuit coordinates the muscles needed to generate the speech. Stuttering stems from the initiation of speech, so only the first of the two circuits is impaired.

“In stuttering, the CDs themselves are fine, but the mechanism for choosing them is impaired,” said Guenther.

This theory matches behavioral observations of stuttering. People will often speak words fluently later in a sentence, even if the same words cause stuttering at the beginning of a sentence.

Guenther and his team created computational models of how the speech initiation circuit performs in a nonstuttering individual. Because Parkinson’s disease also affects the initiation circuit, they can compare these models directly to data taken from the basal ganglia during deep brain stimulation surgery in patients with the disease.

“This gives us a fighting chance of finding the specific problems underlying stuttering and addressing them with highly targeted drugs or technological treatments that have minimal unwanted side effects,” said Guenther.

———————– MORE MEETING INFORMATION ———————–
USEFUL LINKS
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eventpilotadmin.com/web/planner.php?id=ASASPRING22
Press Room: https://acoustics.org/world-wide-press-room/

WORLDWIDE PRESS ROOM
In the coming weeks, ASA’s Worldwide Press Room will be updated with additional tips on dozens of newsworthy stories and with lay language papers, which are 300 to 500 word summaries of presentations written by scientists for a general audience and accompanied by photos, audio and video. You can visit the site during the meeting at https://acoustics.org/world-wide-press-room/.

PRESS REGISTRATION
We will grant free registration to credentialed journalists and professional freelance journalists. If you are a reporter and would like to attend, contact AIP Media Services at media@aip.org. For urgent requests, staff at media@aip.org can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.