Machine Listening: Making Speech Recognition Systems More Inclusive

Machine Listening: Making Speech Recognition Systems More Inclusive

Study explores how African American English speakers adapt their speech to be understood by voice technology.

Speech Recognition

African American English speakers adjust rate and pitch based on audience. Credit: Michelle Cohn, Zion Mengesha, Michal Lahav, and Courtney Heldreth

WASHINGTON, April 30, 2024 – Interactions with voice technology, such as Amazon’s Alexa, Apple’s Siri, and Google Assistant, can make life easier by increasing efficiency and productivity. However, errors in generating and understanding speech during interactions are common. When using these devices, speakers often style-shift their speech from their normal patterns into a louder and… click to read more

From: JASA Express Letters
Article: African American English speakers’ pitch variation and rate adjustments for imagined technological and human addressees
DOI: 10.1121/10.0025484

Achieving Linguistic Justice for African American English #ASA184

Achieving Linguistic Justice for African American English #ASA184

African American English varies systematically and is internally consistent; a proper understanding of this variation prevents the misdiagnosis of speech and language disorder.

Media Contact:
Ashley Piccone
AIP Media
301-209-3090
media@aip.org

CHICAGO, May 10, 2023 – African American English (AAE) is a variety of English spoken primarily, though not exclusively, by Black Americans of historical African descent. Because AAE varies from white American English (WAE) in a systematic way, it is possible that speech and hearing specialists unfamiliar with the language variety could misidentify differences in speech production as speech disorder. Professional understanding of the difference between typical variation and errors in the language system is the first step for accurately identifying disorder and establishing linguistic justice for AAE speakers.

(left) 5-year-old AAE girl’s production of “elephant.” When the /t/ sound in /nt/ is produced the AAE speaker produces less aspiration noise. The /t/ sound exists for a shorter period in time relative to the WAE /t/ production. The duration of the word is 740 milliseconds (.74 seconds). (right) 5-year-old WAE girl’s production of “elephant.” When the /t/ sound in /nt/ is produced the WAE speaker produces a lot of aspiration noise. The /t/ sound exists for a longer period in time relative to the AAE /t/ production. The duration of the entire word is 973 milliseconds (.97) seconds. Both girls have intelligible productions of the word “elephant.”

In her presentation, “Kids talk too: Linguistic justice and child African American English,” Yolanda Holt of East Carolina University will describe aspects of the systematic variation between AAE and WAE speech production in children. The talk will take place Wednesday, May 10, at 10:50 a.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.

Common characteristics of AAE speech include variation at all linguistic levels, from sound production at the word level to the choice of commentary in professional interpersonal interactions. A frequent feature of AAE is final consonant reduction/deletion and final consonant cluster reduction. Holt provided the following example to illustrate word level to interpersonal level linguistic variation.

“In the professional setting, if one AAE-speaking professional woman wanted to compliment the attire of the other, the exchange might sound something like this: [Speaker 1] ‘I see you rockin’ the tone on tone.’ [Speaker 2] ‘Frien’, I’m jus’ tryin’ to be like you wit’ the fully executive flex.’’’

This example, in addition to using common aspects of AAE word shape, shows how the choice to use AAE in a professional setting is a way for the two women to share a message beyond the words.

“This exchange illustrates a complex and nuanced cultural understanding between the two speakers. In a few words, they communicate professional respect and a subtle appreciation for the intricate balance that African American women navigate in bringing their whole selves to the corporate setting,” said Holt.

Holt and her team examined final consonant cluster reduction (e.g., expressing “shift” as “shif’”) in 4- and 5-year-old children. Using instrumental acoustic phonetic analysis, they discovered that the variation in final consonant production in AAE is likely not a wholesale elimination of word endings but is perhaps a difference in aspects of articulation.

“This is an important finding because it could be assumed that if a child does not fully articulate the final sound, they are not aware of its existence,” said Holt. “By illustrating that the AAE-speaking child produces a variation of the final sound, not a wholesale removal, we help to eliminate the mistaken idea that AAE speakers don’t know the ending sounds exist.”

Holt believes the fields of speech and language science, education, and computer science should expect and accept such variation in human communication. Linguistic justice occurs when we accept variation in human language without penalizing the user or defining their speech as “wrong.”

“Language is alive. It grows and changes over each generation,” said Holt. “Accepting the speech and language used by each generation and each group of speakers is an acceptance of the individual, their life, and their experience. Acceptance, not tolerance, is the next step in the march towards linguistic justice. For that to occur, we must learn from our speakers and educate our professionals that different can be typical. It is not always disordered.”

———————– MORE MEETING INFORMATION ———————–
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eppro02.ativ.me/web/planner.php?id=ASASPRING23&proof=true

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org.  For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.