–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Singing isnât just for the stage â everyone enjoys finding their voices in songs, regardless of whether they are performing in an auditorium or merely humming in the shower. Singing well is more than just hitting the right notes, itâs also about using your voice as an instrument effectively. One technique that professional opera singers master is to change how they pronounce their vowels based on the pitch they are singing. But why do singers change their vowels? Is it only to sound more beautiful, or is it necessary to hit these higher notes?
We explore this question by studying what non-professional singers do â if it is necessary to change the vowels to reach higher notes, then non-professional singers will also do the same at higher notes. The participants were asked to sing various English vowels across their pitch range, much like a vocal warm-up exercise. These vowels included [i] (like “beat”), [É] (like “bet”), [ĂŠ] (like “bat”), [É] (like “bot”), and [u] (like “boot”). Since vowels are made by different tongue gestures, we used ultrasound imaging to capture images of the participantsâ tongue positions as they sang. This allowed us to see how the tongue moved across different pitches and vowels.
We found that participants who managed to sing more pitches did indeed adjust their tongue shapes when reaching high notes. Even when isolating the participants who said they have never sung in choir or acapella group contexts, the trend still stands. Those who are able to sing at higher pitches try to adjust their vowels at higher pitches. In contrast, participants who cannot sing a wide pitch range generally do not change their vowels based on pitch.
We then compared this to pilot data from an operatic soprano, who showed gradual adjustments in tongue positions across her whole pitch range, effectively neutralising the differences between vowels at her highest pitches. In other words, all the vowels at her highest pitches sounded very similar to each other.
Overall, these findings suggest that maybe changing our mouth shape and tongue position is necessary when singing high pitches. The way singers modify their vowels could be an essential part of achieving a well-balanced, efficient voice, especially for hitting high notes. By better understanding how vowels and pitch interact with each other, this research opens the door to further studies on how singers use their vocal instruments and what are the keys to effective voice production. Together, this research offers insights into not only our appreciation for the art of singing, but also into the complex mechanisms of human vocal production.
Video 1: Example of sung vowels at relatively lower pitches.
Video 2: Example of sung vowels at relatively higher pitches.
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Itâs much easier to understand what others are saying if youâre listening to a close friend or family member, compared to a stranger. If you practice listening to the voices of people youâve never met before, you might also become better at understanding them too.
In our research, we ask people to visit the lab with a friend or partner. We record their voices while they read sentences aloud. We then invite the volunteers back for a listening test. During the test, they hear sentences and click words on a screen to show what they heard. This is made more difficult by playing a second sentence at the same time, which the volunteers are told to ignore. This is like having a conversation when there are other people talking around you. Our volunteers listen to many sentences over the course of the experiment. Sometimes, the sentence is one recorded from their friend or partner. Other times, itâs one recorded from someone theyâve never met. Our studies have shown that people are best at understanding the sentences spoken by their friend or partner.
In one study, we manipulated the sentence recordings, to change the sound of the voices. The voices still sounded natural. Yet, volunteers could no longer recognize them as their friend or partner. We found that participants were still better at understanding the sentences, even though they didnât recognize the voice.
In other studies, weâve investigated how people learn to become familiar with new voices. Each volunteer learns the names of three new people. Theyâve never met these people, but we play them lots of recordings of their voices. This is like when you listen to a new podcast or radio show. Weâve found that people become very good at understanding these people. In other words, we can train people to become familiar with new voices.
In new work that hasnât yet been published, we found that voice familiarization training benefits both older and younger people. So, it may help older people who find it very difficult to listen in noisy places. Many environments contain background noiseâfrom office parties to hospitals and train stations. Ultimately, we hope that we can familiarize people with voices they hear in their daily lives, to make it easier to listen in noisy places.
Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia
Popular version of 1pSC6 – On the Small Flat Vowel Systems of Australian Languages
Presented at the 185th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0022855
Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.
Australia originally had 250-350 Aboriginal languages. Today, about 20 of these survive and none has more than 5,000 speakers. Most of the original languages shared very similar sound systems. About half of them had just three vowels, another 10% or so had four, and a further 25% or so had a five-vowel system. Only 16% of the worldâs languages have a vowel inventory of four or less (the average number is six; some Germanic languages, such as Danish, have 20 or so).
This paper asks why many Australian languages have so few vowels. Our research shows that the vowels of Aboriginal languages are much more âsquashed downâ in the acoustic space than those of European languages (Fig 1), indicating that the tongue does not come as close to the roof of the mouth as in European languages. The two âclosestâ vowels are [e] (a sound with the tongue at the front of the mouth, between âpitâ and âpetâ) and [o] (at the back of the mouth with rounded lips, between âputâ and âpotâ). The âopenâ (low-tongue) vowel is best transcribed [É], a sound between âpatâ and âputtâ, but with a less open jaw. Four- and five-vowel systems squeeze the extra vowels in between these, adding [É] (between âpetâ and âpatâ) and [É] (more or less exactly as in âpotâ), with little or no expansion of the acoustic space. Thus, the majority of Australian languages lack any true close (high-tongue) vowels (as in âpeatâ and âpoolâ).
So why do Australian languages have a âflattenedâ vowel space? The answer may lie in the ears of the speakers rather than in their mouths. Aboriginal Australians have by far the highest prevalence of chronic middle ear infection in the world. Our research with Aboriginal groups of diverse age, language and geographical location shows 30-60% of speakers have a hearing impairment in one or both ears (Fig 2). Nearly all Aboriginal language groups have developed an alternate sign language to complement the spoken one. Our previous analysis has shown that the sound systems of Australian languages resemble those of individual hearing-impaired children in several important ways, leading us to hypothesise that the consonant systems and the word structure of these languages have been influenced by the effects of chronic middle ear infection over generations.
A reduction in the vowel space is another of these resemblances. Middle ear infection affects the low frequency end of the scale (under 500 Hz), thus reducing the prominence of the distinctive lower resonances of close vowels, such as in âpeatâ and âpoolâ (Fig 3). It is possible that, over generations, speakers have raised the frequencies of these resonances to make them more hearable, thereby constricting the acoustic space the languages use. If so, we may ask whether, on purely acoustic grounds, communicating in an Aboriginal language in the classroom â using a sound system optimally attuned to the typical hearing profile of the speech community â might offer improved educational outcomes for indigenous children in the early years.
Popular version of 1aSC2 – Retroflex nasals in the Mai-Ndombe (DRC): the case of nasals in North Boma B82
Presented at the 185th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0022724
Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.
âAll language sounds are equal but some language sounds are more equal than othersâ â or, at least, that is the case in academia. While French iâs and English tâs are constantly re-dotted and re-crossed, the vast majority of the worldâs linguistic communities remain undocumented, with their unique sound heritage gradually fading into silence. The preservation of humankind’s linguistic diversity relies solely on detailed documentation and description.
Over the past few years, a team of linguists from Ghent, Mons, and Kinshasa have dedicated their efforts to recording the phonetic and phonological oddities of southwest Congo’s Bantu varieties. Among these, North Boma (Figure 1) stands out for its display of rare sounds known as “retroflexes”. These sounds are particularly rare in central Africa, which mirrors a more general state of under-documentation of the areaâs sound inventories. Through extensive fieldwork in the North Boma area, meticulous data analysis, and advanced statistical processing, these researchers have unveiled the first comprehensive account of North Boma’s retroflexes. As it turns out, North Boma retroflexes are exclusively nasal, a striking typological circumstance. Their work, presented in Sydney this year, not only enriches our understanding of these unique consonants but also unveils potential historical implications behind their prevalence in the region.
Figure 1 â the North Boma area
The study highlights the remarkable salience of North Boma’s retroflexes, characterised by distinct acoustic features that sometimes align and sometimes deviate from those reported in the existing literature. This is clearly shown in Figure 2, where the North Boma nasal space is plotted using a technique known as âMultiple Factor Analysisâ allowing for the study of small corpora organised into clear variable groups. As can be seen, their behaviour differs greatly from that of the other nasals of North Boma. This uniqueness also suggests that their presence in the area may stem from interactions with long-lost hunter-gatherer forest languages, providing invaluable insights into the region’s history.
Figure 2 â MFA results show that retroflex and non-retroflex nasals behave very differently in North Boma
Extraordinary sound patterns are waiting to be discovered in the least documented language communities of the world. North Boma serves as just one compelling example among many. As we navigate towards an unprecedented language loss crisis, the imperative for detailed phonetic documentation becomes increasingly evident.
Achieving Linguistic Justice for African American English #ASA184
African American English varies systematically and is internally consistent; a proper understanding of this variation prevents the misdiagnosis of speech and language disorder.
Media Contact: Ashley Piccone AIP Media 301-209-3090 media@aip.org
CHICAGO, May 10, 2023 â African American English (AAE) is a variety of English spoken primarily, though not exclusively, by Black Americans of historical African descent. Because AAE varies from white American English (WAE) in a systematic way, it is possible that speech and hearing specialists unfamiliar with the language variety could misidentify differences in speech production as speech disorder. Professional understanding of the difference between typical variation and errors in the language system is the first step for accurately identifying disorder and establishing linguistic justice for AAE speakers.
(left) 5-year-old AAE girlâs production of âelephant.â When the /t/ sound in /nt/ is produced the AAE speaker produces less aspiration noise. The /t/ sound exists for a shorter period in time relative to the WAE /t/ production. The duration of the word is 740 milliseconds (.74 seconds). (right) 5-year-old WAE girlâs production of âelephant.â When the /t/ sound in /nt/ is produced the WAE speaker produces a lot of aspiration noise. The /t/ sound exists for a longer period in time relative to the AAE /t/ production. The duration of the entire word is 973 milliseconds (.97) seconds. Both girls have intelligible productions of the word âelephant.â
In her presentation, âKids talk too: Linguistic justice and child African American English,â Yolanda Holt of East Carolina University will describe aspects of the systematic variation between AAE and WAE speech production in children. The talk will take place Wednesday, May 10, at 10:50 a.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.
Common characteristics of AAE speech include variation at all linguistic levels, from sound production at the word level to the choice of commentary in professional interpersonal interactions. A frequent feature of AAE is final consonant reduction/deletion and final consonant cluster reduction. Holt provided the following example to illustrate word level to interpersonal level linguistic variation.
âIn the professional setting, if one AAE-speaking professional woman wanted to compliment the attire of the other, the exchange might sound something like this: [Speaker 1] âI see you rockinâ the tone on tone.â [Speaker 2] âFrienâ, Iâm jusâ tryinâ to be like you witâ the fully executive flex.âââ
This example, in addition to using common aspects of AAE word shape, shows how the choice to use AAE in a professional setting is a way for the two women to share a message beyond the words.
âThis exchange illustrates a complex and nuanced cultural understanding between the two speakers. In a few words, they communicate professional respect and a subtle appreciation for the intricate balance that African American women navigate in bringing their whole selves to the corporate setting,â said Holt.
Holt and her team examined final consonant cluster reduction (e.g., expressing âshiftâ as âshifââ) in 4- and 5-year-old children. Using instrumental acoustic phonetic analysis, they discovered that the variation in final consonant production in AAE is likely not a wholesale elimination of word endings but is perhaps a difference in aspects of articulation.
âThis is an important finding because it could be assumed that if a child does not fully articulate the final sound, they are not aware of its existence,â said Holt. âBy illustrating that the AAE-speaking child produces a variation of the final sound, not a wholesale removal, we help to eliminate the mistaken idea that AAE speakers donât know the ending sounds exist.â
Holt believes the fields of speech and language science, education, and computer science should expect and accept such variation in human communication. Linguistic justice occurs when we accept variation in human language without penalizing the user or defining their speech as âwrong.â
âLanguage is alive. It grows and changes over each generation,â said Holt. âAccepting the speech and language used by each generation and each group of speakers is an acceptance of the individual, their life, and their experience. Acceptance, not tolerance, is the next step in the march towards linguistic justice. For that to occur, we must learn from our speakers and educate our professionals that different can be typical. It is not always disordered.â
ASA PRESS ROOM In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.
LAY LANGUAGE PAPERS ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.
PRESS REGISTRATION ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.
ABOUT THE ACOUSTICAL SOCIETY OF AMERICA The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.
Experiments show how speech and comprehension change when people communicate with artificial intelligence.
Media Contact: Ashley Piccone AIP Media 301-209-3090 media@aip.org
CHICAGO, May 9, 2023 â Millions of people now regularly communicate with AI-based devices, such as smartphones, speakers, and cars. Studying these interactions can improve AIâs ability to understand human speech and determine how talking with technology impacts language.
In their talk, âClear speech in the new digital era: Speaking and listening clearly to voice-AI systems,â Georgia Zellou and Michelle Cohn of the University of California, Davis will describe experiments to investigate how speech and comprehension change when humans communicate with AI. The presentation will take place Tuesday, May 9, at 12:40 p.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.
Humans change their voice when communicating with AI. Credit: Michelle Cohn
In their first line of questioning, Zellou and Cohn examined how people adjust their voice when communicating with an AI system compared to talking with another human. They found the participants produced louder and slower speech with less pitch variation when they spoke to voice-AI (e.g., Siri, Alexa), even across identical interactions.
On the listening side, the researchers showed that how humanlike a device sounds impacts how well listeners will understand it. If a listener thinks the voice talking is a device, they are less able to accurately understand. However, if it sounds more humanlike, their comprehension increases. Clear speech, like in the style of a newscaster, was better understood overall, even if it was machine-generated.
âWe do see some differences in patterns across human- and machine-directed speech: People are louder and slower when talking to technology. These adjustments are similar to the changes speakers make when talking in background noise, such as in a crowded restaurant,â said Zellou. âPeople also have expectations that the systems will misunderstand them and that they wonât be able to understand the output.â
Clarifying what makes a speaker intelligible will be useful for voice technology. For example, these results suggest that text-to-speech voices should adopt a âclearâ style in noisy conditions.
Looking forward, the team aims to apply these studies to people from different age groups and social and language backgrounds. They also want to investigate how people learn language from devices and how linguistic behavior adapts as technology changes.
âThere are so many open questions,â said Cohn. âFor example, could voice-AI be a source of language change among some speakers? As technology advances, such as with large language models like ChatGPT, the boundary between human and machine is changing â how will our language change with it?â
ASA PRESS ROOM In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.
LAY LANGUAGE PAPERS ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.
PRESS REGISTRATION ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.
ABOUT THE ACOUSTICAL SOCIETY OF AMERICA The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.