Achieving Linguistic Justice for African American English #ASA184

African American English varies systematically and is internally consistent; a proper understanding of this variation prevents the misdiagnosis of speech and language disorder.

Media Contact:
Ashley Piccone
AIP Media
301-209-3090
media@aip.org

CHICAGO, May 10, 2023 – African American English (AAE) is a variety of English spoken primarily, though not exclusively, by Black Americans of historical African descent. Because AAE varies from white American English (WAE) in a systematic way, it is possible that speech and hearing specialists unfamiliar with the language variety could misidentify differences in speech production as speech disorder. Professional understanding of the difference between typical variation and errors in the language system is the first step for accurately identifying disorder and establishing linguistic justice for AAE speakers.

(left) 5-year-old AAE girl’s production of “elephant.” When the /t/ sound in /nt/ is produced the AAE speaker produces less aspiration noise. The /t/ sound exists for a shorter period in time relative to the WAE /t/ production. The duration of the word is 740 milliseconds (.74 seconds). (right) 5-year-old WAE girl’s production of “elephant.” When the /t/ sound in /nt/ is produced the WAE speaker produces a lot of aspiration noise. The /t/ sound exists for a longer period in time relative to the AAE /t/ production. The duration of the entire word is 973 milliseconds (.97) seconds. Both girls have intelligible productions of the word “elephant.”

In her presentation, “Kids talk too: Linguistic justice and child African American English,” Yolanda Holt of East Carolina University will describe aspects of the systematic variation between AAE and WAE speech production in children. The talk will take place Wednesday, May 10, at 10:50 a.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.

Common characteristics of AAE speech include variation at all linguistic levels, from sound production at the word level to the choice of commentary in professional interpersonal interactions. A frequent feature of AAE is final consonant reduction/deletion and final consonant cluster reduction. Holt provided the following example to illustrate word level to interpersonal level linguistic variation.

“In the professional setting, if one AAE-speaking professional woman wanted to compliment the attire of the other, the exchange might sound something like this: [Speaker 1] ‘I see you rockin’ the tone on tone.’ [Speaker 2] ‘Frien’, I’m jus’ tryin’ to be like you wit’ the fully executive flex.’’’

This example, in addition to using common aspects of AAE word shape, shows how the choice to use AAE in a professional setting is a way for the two women to share a message beyond the words.

“This exchange illustrates a complex and nuanced cultural understanding between the two speakers. In a few words, they communicate professional respect and a subtle appreciation for the intricate balance that African American women navigate in bringing their whole selves to the corporate setting,” said Holt.

Holt and her team examined final consonant cluster reduction (e.g., expressing “shift” as “shif’”) in 4- and 5-year-old children. Using instrumental acoustic phonetic analysis, they discovered that the variation in final consonant production in AAE is likely not a wholesale elimination of word endings but is perhaps a difference in aspects of articulation.

“This is an important finding because it could be assumed that if a child does not fully articulate the final sound, they are not aware of its existence,” said Holt. “By illustrating that the AAE-speaking child produces a variation of the final sound, not a wholesale removal, we help to eliminate the mistaken idea that AAE speakers don’t know the ending sounds exist.”

Holt believes the fields of speech and language science, education, and computer science should expect and accept such variation in human communication. Linguistic justice occurs when we accept variation in human language without penalizing the user or defining their speech as “wrong.”

“Language is alive. It grows and changes over each generation,” said Holt. “Accepting the speech and language used by each generation and each group of speakers is an acceptance of the individual, their life, and their experience. Acceptance, not tolerance, is the next step in the march towards linguistic justice. For that to occur, we must learn from our speakers and educate our professionals that different can be typical. It is not always disordered.”

———————– MORE MEETING INFORMATION ———————–
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eppro02.ativ.me/web/planner.php?id=ASASPRING23&proof=true

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

Hey Siri, Can You Hear Me? #ASA184

Experiments show how speech and comprehension change when people communicate with artificial intelligence.

Media Contact:
Ashley Piccone
AIP Media
301-209-3090
media@aip.org

CHICAGO, May 9, 2023 – Millions of people now regularly communicate with AI-based devices, such as smartphones, speakers, and cars. Studying these interactions can improve AI’s ability to understand human speech and determine how talking with technology impacts language.

In their talk, “Clear speech in the new digital era: Speaking and listening clearly to voice-AI systems,” Georgia Zellou and Michelle Cohn of the University of California, Davis will describe experiments to investigate how speech and comprehension change when humans communicate with AI. The presentation will take place Tuesday, May 9, at 12:40 p.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.

Humans change their voice when communicating with AI. Credit: Michelle Cohn

In their first line of questioning, Zellou and Cohn examined how people adjust their voice when communicating with an AI system compared to talking with another human. They found the participants produced louder and slower speech with less pitch variation when they spoke to voice-AI (e.g., Siri, Alexa), even across identical interactions.

On the listening side, the researchers showed that how humanlike a device sounds impacts how well listeners will understand it. If a listener thinks the voice talking is a device, they are less able to accurately understand. However, if it sounds more humanlike, their comprehension increases. Clear speech, like in the style of a newscaster, was better understood overall, even if it was machine-generated.

“We do see some differences in patterns across human- and machine-directed speech: People are louder and slower when talking to technology. These adjustments are similar to the changes speakers make when talking in background noise, such as in a crowded restaurant,” said Zellou. “People also have expectations that the systems will misunderstand them and that they won’t be able to understand the output.”

Clarifying what makes a speaker intelligible will be useful for voice technology. For example, these results suggest that text-to-speech voices should adopt a “clear” style in noisy conditions.

Looking forward, the team aims to apply these studies to people from different age groups and social and language backgrounds. They also want to investigate how people learn language from devices and how linguistic behavior adapts as technology changes.

“There are so many open questions,” said Cohn. “For example, could voice-AI be a source of language change among some speakers? As technology advances, such as with large language models like ChatGPT, the boundary between human and machine is changing – how will our language change with it?”

———————– MORE MEETING INFORMATION ———————–
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eppro02.ativ.me/web/planner.php?id=ASASPRING23&proof=true

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

Vocal Tract Size, Shape Dictate Speech Sounds

Main anatomical shape factors of the vocal tract. Credit: Antoine Serrurier

WASHINGTON, March 21, 2023 – Only humans have the ability to use speech. Remarkably, this communication is understandable across accent, social background, and anatomy despite a wide variety of ways to produce the necessary sounds. In JASA, published on behalf of the Acoustical Society of America by AIP Publishing, researchers from…click to read more

From the Journal: The Journal of the Acoustical Society of America
Article: Morphological and acoustic modeling of the vocal tract
DOI: 10.1121/10.0017356

The Impact of Formal Musical Training on Speech Comprehension in Heavily Distracting Environments

Alexandra Bruder – alexandra.l.bruder@vanderbilt.edu

Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States

Joseph Schlesinger – joseph.j.schlesinger@vumc.org
Twitter: @DrJazz615

Vanderbilt University Medical Center
Nashville, TN 37205
United States

Clayton D Rothwell – crothwell@infoscitex.com<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States

Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.

Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?

We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.

Image courtesy of Bruder et al. original paper. (Psychology of Music).

Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.

Image courtesy of Bruder et al. original paper (Psychology of Music).

Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.

Diverse Social Networks Reduce Accent Judgments

Perception in context: How racialized identities impact speech perception

Media Contact:
Larry Frum
AIP Media
301-209-3090
media@aip.org

DENVER, May 24, 2022 – Everyone has an accent. But the intelligibility of speech doesn’t just depend on that accent; it also depends on the listener. Visual cues and the diversity of the listener’s social network can impact their ability to understand and transcribe sentences after listening to the spoken word.

Ethan Kutlu, of the University of Iowa, will discuss this social phenomenon in his presentation, “Perception in context: How racialized identities impact speech perception,” which will take place May 24 at 12:15 p.m. Eastern U.S. as part of the 182nd Meeting of the Acoustical Society of America at the Sheraton Denver Downtown Hotel.

Kutlu and his team paired American, British, and Indian varieties of English with images of white and South Asian faces. While the accents differed, they were all normalized to have the same base intelligibility. They played these voices for listeners from a low-diverse (Gainesville, Florida) and high-diverse environment (Montreal, Quebec).

“Racial and linguistic diversity in our social networks and in our surrounding environments impact how we engage in perceiving speech. Encountering new voices and accents that are different from our own improves our ability to attend to speech that varies from our own,” said Kutlu. “We all have accents and embracing this is not hurting our own or others’ speech perception. On the contrary, it helps all of us.”

A participant’s ability to transcribe sentences decreased and they rated voices as more accented whenever the speech was paired with a South Asian face – no matter the English variety of the spoken word. Indian English paired with white faces was judged as heavily accented when compared to British and American English.

However, these results varied greatly by the listener’s social network and geographic context. Montreal participants, residents of a dual language city, were overall more accurate when transcribing speech. They did not change their judgments based on the faces they saw on the screen.

———————– MORE MEETING INFORMATION ———————–
USEFUL LINKS
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eventpilotadmin.com/web/planner.php?id=ASASPRING22
Press Room: https://acoustics.org/world-wide-press-room/

WORLDWIDE PRESS ROOM
In the coming weeks, ASA’s Worldwide Press Room will be updated with additional tips on dozens of newsworthy stories and with lay language papers, which are 300 to 500 word summaries of presentations written by scientists for a general audience and accompanied by photos, audio and video. You can visit the site during the meeting at https://acoustics.org/world-wide-press-room/.

PRESS REGISTRATION
We will grant free registration to credentialed journalists and professional freelance journalists. If you are a reporter and would like to attend, contact AIP Media Services at media@aip.org. For urgent requests, staff at media@aip.org can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

4aSC3 – Talkers prepare their lips before audibly speaking – Is this the same thing as coarticulated speech?

Peter A. Krause – peter.krause066@csuci.edu
CSU Channel Islands
One University Dr.
Camarillo, CA 93012

Popular version of 4aSC3 – Understanding anticipatory speech postures: does coarticulation extend outside the acoustic utterance?
Presented 9:45 Thursday Morning, May 26, 2022
182^nd ASA Meeting
Click here to read the abstract

A speech sound like /s/ not fixed. The sound at the beginning of “soon” is not identical to the sound at the beginning of “seen.” We call this contextual variability coarticulation.

Verbal recording of “soon” and “seen.” Listen closely to the subtle differences in the initial /s/ sound.

A spectrogram of the same recording of “soon” and “seen.” Note how the /s/ sounds have a slightly different distribution of intensity over the frequency range shown.

Some theoretical models explain coarticulation by assuming that talkers retrieve slightly different versions of their /s/ sound from memory, depending on the sound to follow. Others emphasize that articulatory actions overlap in time, rather than keeping to a regimented sequence: Talkers usually start rounding their lips for the /u/ (“oo”) sound in “soon” while still making the hissy sound of the /s/, instead of waiting for the voiced part of the vowel. (See picture below.) But even overlapping action accounts of coarticulation disagree on how “baked in” the coarticulation is. Does the dictionary in your head tell you that the word “soon” is produced with a rounded /s/? Or is there a more general, flexible process whereby if you know an /u/ sound is coming up, you round your lips if able?

A depiction of the same speaker’s lips when producing the /s/ sound in “seen” (top) and the /s/ sound in “soon” (bottom). Note that in the latter case, the lips are already rounded.

If the latter, it is reasonable to ask whether coarticulation only happens during the audible portions of speech. My work suggests that the answer is no! For example, I have shown that during word-reading tasks, talkers tend to pre-round their lips a bit if they have been led to believe that an upcoming (but not yet seen) word will include an /u/ sound. This effect goes away if the word is equally likely to have an /u/ sound or an /i/ (“ee”) sound. More recently, I have shown that talkers awaiting their turn in natural, bi-directional conversation anticipate their upcoming utterance with their lips, by shaping them in sound-specific ways. (At least, they do so when preparing very short phrases like “yeah” or “okay.” For longer phrases, this effect disappears, which remains an interesting mystery.) Nevertheless, talkers apparently “lean forward” into their speech actions some of the time. In my talk I will argue that much of what we call “coarticulation” may be a special case of a more general pattern relating speech planning to articulatory action. In fact, it may reflect processes generally at work in all human action planning.

Plots of lip area taken from my recent study of bi-directional conversation. Plots trace backward in time from the moment at which audible speech began (Latency 0). “Labially constrained” utterances are those requiring shrunken-down lips, like those starting with /p/ or having an early /u/ sound. Note that for short phrases, lip areas are partially set several seconds before audible speech begins.