–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
If you listen to two different sounds that are similar in pitch across the ears, something strange happens. The two sounds blend perceptually to create an illusion of a new sound, similar to what happens with different colors across the eyes.
For example, if you listen here with stereo headphones to the vowels “ah” as in hot and “ee” as in heed, spoken by two different talkers with different voice pitch – a male talker and a female talker – you will hear two vowels.
Figure 1. Perception when two different vowels are played to the two ears at different pitch. Play different pitch example.Note: Stereo headphones are necessary to experience the illusion
But if these same vowels are spoken by the same talker, you will experience something called binaural fusion (Reiss and Molis, 2021). Instead of hearing two different vowels, you will hear a single new vowel. This new vowel will be a blend of the two original vowels, something in between like “eh” as in head.
Figure 2. Perception when two different vowels are played to the two ears at the same pitch. Play same pitch example. Note: Stereo headphones are necessary to experience the illusion
This illusion is not confined to steady sounds, but also happens for sounds that are fluctuating, such as a tone that is fluctuating in one ear and steady in the other ear. This makes localization of the fluctuating tone difficult.
While we know that people experience binaural fusion, we don’t know what happens in the brain so that some sounds fuse while others are heard as distinct. It’s hard to measure detailed brain activity in humans, so we are now studying what happens in the brain of animals, in this case ferrets, when they experience the same illusion. The first thing we had to do was demonstrate that ferrets perceive these illusions the same way as humans. For vowels, ferrets were first trained to indicate when they heard the vowel “eh”, and to ignore the vowels “ah” and “ee”. When “ah” and “ee” were played to the two ears at the same pitch, the ferrets responded that they heard “eh”. Similarly, for fluctuating tones, ferrets were trained to indicate the side where they heard the fluctuating tone, and they experienced the same difficulties as human listeners.
As a next step, recordings from cells in the brain will reveal how brain activity leads to these illusory phenomena. Binaural fusion and the converse, binaural fission, are important to understand because together they underlie how the brain groups components of sound that belong to one source, such as a single talker, and separates those that belong to different sources, such as other talkers (Bregman, 1990; Bronkhorst, 2000).
It is shown that people with hearing loss, including those with cochlear implants, often experience excessive binaural fusion, and fuse voices of different pitch together (Reiss et al., 2014; 2017; 2018). Excessive binaural fusion explains a large portion of difficulties with understanding speech in noisy environments (Oh et al., 2022; 2023). Understanding how brain circuits encode binaural fusion and fission will show us how to train or rewire the brain to help people with hearing loss and other auditory processing disorders.
In the meantime, think about how you can come up with other new illusory sounds by combining two different sounds of the same pitch!
Works cited
Bregman, A. S. (1990). Auditory Scene Analysis (MIT Press, Cambridge, MA).
Bronkhorst, A. W. (2000). The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acustica united with Acustica, 86(1), 117-128.
Oh, Y., Hartling, C. L., Srinivasan, N. K., Diedesch, A. C., Gallun, F. J., & Reiss, L. A. J. (2022). Factors underlying masking release by voice-gender differences and spatial separation cues in multi-talker listening environments in listeners with and without hearing loss. Frontiers in neuroscience, 16, 1059639.
Oh, Y., Srinivasan, N.K., Hartling, C.L., Gallun, F.J., and Reiss, L.A.J. (2023). Differential effects of binaural pitch fusion range on the benefits of voice gender differences in a ‘cocktail party’ environment for bimodal and bilateral cochlear implant users. Ear Hear. 44(2), 318–329.
Reiss, L. A., Fowler, J. R., Hartling, C. L., and Oh, Y. (2018) Binaural pitch fusion in bilateral cochlear implant users. Ear Hear. 39(2), 390-397.
Reiss, L.A., Ito, R.A., Eggleston, J.L., and Wozny, D.R. (2014). Abnormal binaural spectral integration in cochlear implant users. J. Assoc. Res. Otolaryngol., 15(2), 235–248.
Reiss, L.A.J., and Molis, M.R.. (2021) An Alternative Explanation for Difficulties with Speech in Background Talkers: Abnormal Fusion of Vowels across Fundamental Frequency and Ears. J. Assoc. Res. Otolaryngol., 22(4): 443-461.
Reiss, L.A., Shayman, C.S., Walker, E.P., Bennett, K.O., Fowler, J.R., Hartling, C.L., Glickman, B., Lasarev, M.R., and Oh, Y. (2017). Binaural pitch fusion: Comparison of normal-hearing and hearing-impaired listeners. J.Acoust. Soc. Am., 141(3), 1909–1920.
Popular version of 1aMU8 – Effect of years of voice training on chest and head register tongue shape variability
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0034945
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Imagine being in a voice lesson, and as you try to hit a high note, your voice coach says, “suppress your tongue” or “pretend your tongue doesn’t exist!” What does this mean, and why do singers do this?
One vocal technique used by professional singers is to sing in different vocal registers. Generally, a man’s natural speaking voice and the voice people use to sing lower notes is called the chest voice—you can feel a vibration in your chest if you place your hand over it as you vocalize. When moving to higher notes, singers shift to their head voice, where vibrations feel stronger in the head. However, what role does the tongue play in this transition? Do all singers, including amateurs, naturally adjust their tongue when switching registers, or is this adjustment a learned skill?
Figure 1: Approximate location of feeling/sensation for chest and head voice.
We are interested in vowels and the pitch range during the passaggio, which is the shift or transition point between different vocal registers. The voice is very unstable and prone to audible cracking during the passaggio, and singers are trained to navigate it smoothly. We also know that different vowels are produced in different locations in the mouth and possess different qualities. One way that singers successfully navigate the passaggio is by altering the vowel through slight adjustments to tongue shape. To study this, we utilized ultrasound imaging to monitor the position and shape of the tongue while participants with varying levels of vocal training sang vowels across their pitch range, similar to a vocal warm-up.
Video 1: Example of ultrasound recording
The results indicated that, in head voice, the tongue is generally positioned higher in the mouth than in chest voice. Unsurprisingly, this difference is more pronounced for certain vowels than for others.
Figure 2: Tongue position in chest and head voice for front and back vowel groups. Overlapping shades indicate that there is virtually no difference.
Singers’ tongues are also shaped by training. Recall the voice coach’s advice to lower your jaw and tongue while singing—this technique is employed to create more space in the mouth to enhance resonance and vocal projection. Indeed, trained singers generally have a lower overall tongue position.
As professional singers’ transitions between registers sound more seamless, we speculated that trained singers would exhibit smaller differences in tongue position between registers than untrained singers, who have less developed tongue control. In fact, it turns out that the opposite is true: the tongue behaves differently in chest voice and head voice, but only for individuals with vocal training.
Figure 3: Tongue position in chest and head voice for singers with different levels of training.
In summary, our research suggests that tongue adjustments for register shifts may be a learned technique. The manner in which singers adjust their tongues for different vowels and vocal registers could be an essential component in achieving a seamless transition between registers, as well as in the effective use of various vocal qualities. Understanding the interactions among vowels, registers, and the tongue provides insight into the mechanisms of human vocal production and voice pedagogy.
University of Pennsylvania, 3401-C Walnut Street, Suite 300, C Wing, Philadelphia, PA, 19104, United States
Jianjing Kuang
Popular version of 4aMU6 – Ultrasound tongue imaging of vowel spaces across pitches in singing
Presented at the 186 ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0027410
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Singing isn’t just for the stage – everyone enjoys finding their voices in songs, regardless of whether they are performing in an auditorium or merely humming in the shower. Singing well is more than just hitting the right notes, it’s also about using your voice as an instrument effectively. One technique that professional opera singers master is to change how they pronounce their vowels based on the pitch they are singing. But why do singers change their vowels? Is it only to sound more beautiful, or is it necessary to hit these higher notes?
We explore this question by studying what non-professional singers do – if it is necessary to change the vowels to reach higher notes, then non-professional singers will also do the same at higher notes. The participants were asked to sing various English vowels across their pitch range, much like a vocal warm-up exercise. These vowels included [i] (like “beat”), [ɛ] (like “bet”), [æ] (like “bat”), [ɑ] (like “bot”), and [u] (like “boot”). Since vowels are made by different tongue gestures, we used ultrasound imaging to capture images of the participants’ tongue positions as they sang. This allowed us to see how the tongue moved across different pitches and vowels.
We found that participants who managed to sing more pitches did indeed adjust their tongue shapes when reaching high notes. Even when isolating the participants who said they have never sung in choir or acapella group contexts, the trend still stands. Those who are able to sing at higher pitches try to adjust their vowels at higher pitches. In contrast, participants who cannot sing a wide pitch range generally do not change their vowels based on pitch.
We then compared this to pilot data from an operatic soprano, who showed gradual adjustments in tongue positions across her whole pitch range, effectively neutralising the differences between vowels at her highest pitches. In other words, all the vowels at her highest pitches sounded very similar to each other.
Overall, these findings suggest that maybe changing our mouth shape and tongue position is necessary when singing high pitches. The way singers modify their vowels could be an essential part of achieving a well-balanced, efficient voice, especially for hitting high notes. By better understanding how vowels and pitch interact with each other, this research opens the door to further studies on how singers use their vocal instruments and what are the keys to effective voice production. Together, this research offers insights into not only our appreciation for the art of singing, but also into the complex mechanisms of human vocal production.
Video 1: Example of sung vowels at relatively lower pitches.
Video 2: Example of sung vowels at relatively higher pitches.
Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia
Popular version of 1pSC6 – On the Small Flat Vowel Systems of Australian Languages
Presented at the 185th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0022855
Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.
Australia originally had 250-350 Aboriginal languages. Today, about 20 of these survive and none has more than 5,000 speakers. Most of the original languages shared very similar sound systems. About half of them had just three vowels, another 10% or so had four, and a further 25% or so had a five-vowel system. Only 16% of the world’s languages have a vowel inventory of four or less (the average number is six; some Germanic languages, such as Danish, have 20 or so).
This paper asks why many Australian languages have so few vowels. Our research shows that the vowels of Aboriginal languages are much more ‘squashed down’ in the acoustic space than those of European languages (Fig 1), indicating that the tongue does not come as close to the roof of the mouth as in European languages. The two ‘closest’ vowels are [e] (a sound with the tongue at the front of the mouth, between ‘pit’ and ‘pet’) and [o] (at the back of the mouth with rounded lips, between ‘put’ and ‘pot’). The ‘open’ (low-tongue) vowel is best transcribed [ɐ], a sound between ‘pat’ and ‘putt’, but with a less open jaw. Four- and five-vowel systems squeeze the extra vowels in between these, adding [ɛ] (between ‘pet’ and ‘pat’) and [ɔ] (more or less exactly as in ‘pot’), with little or no expansion of the acoustic space. Thus, the majority of Australian languages lack any true close (high-tongue) vowels (as in ‘peat’ and ‘pool’). So why do Australian languages have a ‘flattened’ vowel space? The answer may lie in the ears of the speakers rather than in their mouths. Aboriginal Australians have by far the highest prevalence of chronic middle ear infection in the world. Our research with Aboriginal groups of diverse age, language and geographical location shows 30-60% of speakers have a hearing impairment in one or both ears (Fig 2). Nearly all Aboriginal language groups have developed an alternate sign language to complement the spoken one. Our previous analysis has shown that the sound systems of Australian languages resemble those of individual hearing-impaired children in several important ways, leading us to hypothesise that the consonant systems and the word structure of these languages have been influenced by the effects of chronic middle ear infection over generations.
A reduction in the vowel space is another of these resemblances. Middle ear infection affects the low frequency end of the scale (under 500 Hz), thus reducing the prominence of the distinctive lower resonances of close vowels, such as in ‘peat’ and ‘pool’ (Fig 3). It is possible that, over generations, speakers have raised the frequencies of these resonances to make them more hearable, thereby constricting the acoustic space the languages use. If so, we may ask whether, on purely acoustic grounds, communicating in an Aboriginal language in the classroom – using a sound system optimally attuned to the typical hearing profile of the speech community – might offer improved educational outcomes for indigenous children in the early years.
Ali Abavisani – aliabavi@illinois.edu Jont B. Allen – jontalle@illinois.edu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 405 N Mathews Ave Urbana, IL, 61801
Popular version of paper 1aPP Presented Monday, May 13, 2019 177th ASA Meeting, Louisville, KY
Hearing loss can have serious impact on social life of individuals experiencing it. The effect of hearing loss becomes more complicated in environments such as restaurants, where the background noise is similar to speech. Although hearing aids in various designs, intend to address these issues, users complain about hearing aids performance in social situations, where they are mostly needed. Part of this problem refers to the nature of hearing aids, which do not use speech as part of design and fitting process. If we somehow incorporate speech sounds in real life conditions into the fitting process of hearing aids, it may be possible to address most of the shortcomings that irritates the users.
There have been many studies on the features that are important in identification of speech sounds such as isolated consonant + vowel (CV) phones (i.e., meaningless speech sound). Most of these studies ran experiments on normal hearing listeners, to identify the effects of different speech features in correct recognition. It turned out that manipulation of speech sounds, such as replacing a vowel, or amplifying/attenuating certain parts of sound in time-frequency domain, leads to identification of new speech sounds by the normal hearing listeners. One goal of current study is to investigate whether there are similar responses to such manipulations from listeners who have hearing loss.
We designed a speech-based test that may be utilized by audiologists to determine susceptible speech phones for each individual with hearing loss. The design includes a perceptual measure that corresponds to speech understanding in background noise, where the noise is similar to speech. The perceptual measure identifies the noise level in which the speech sound is recognizable by an average normal hearing listener, at least with 90% accuracy. The speech sounds within the test include combinations of 14 consonants {p, t, k, f, s, S, b, d, g, v, z, Z, m, n} and four vowels {A, ae, I, E}, to cover different features that are present in speech. All the test sounds have pre-evaluated to make sure they are recognizable by normal hearing listeners in the noise conditions of the experiments. Two sets of sounds named T$_1$ and T$_2$ having same consonant-vowel combinations of sounds but different talkers, had been presented to the listeners at their most comfortable level of hearing (not depending to their specific hearing loss). The two speech sets had distinct perceptual measure. When two sounds with similar perceptual measure, and with the same consonant but different vowel are presented to a listener with hearing loss, their response can show us how their particular hearing function, may cause errors in understanding this particular speech sound, and why this function led to recognition of a specific sound instead of the presented speech. Also, presenting sounds from the two sets constitute the means to compare the role of perceptual measure (which is based on normal hearing listeners), on listeners with hearing loss. When the recognition score for a particular listener increases as the result of a change in presented speech sounds, it is an indication on how the fitting process of hearing aid should follow, regarding that particular (listener, speech sound) pair.
While the study shows that improvement or degradation of the speech sounds are listener dependent, on average 85% of sounds are improved when we replaced the CV with same CV but with a better perceptual measure. Additionally, using CVs with similar perceptual measure, on average 28% of CVs are improved when we replaced the vowel with vowel {A}, 28% of CVs are improved when we replaced the vowel with vowel {E}, 25% of CVs are improved when we replaced the vowel with vowel {ae}, and 19% of CVs are improved when we replaced the vowel with vowel {I}.
The confusion pattern in each case, provides insight on how these changes affect the phone recognition in each ear. We propose to prescribe hearing aid amplification tailored to individual ears, based on the confusion pattern, the response from change in perceptual measure, and the response from change in vowel.
These tests are directed at the fine-tuning of hearing aid insertion gain, with the ultimate goal of improving speech perception, and to precisely identify when and for what consonants the ear with hearing loss needs treatment to enhance speech recognition.