1aPP44 – What’s That Noise?  The Effect of Hearing Loss and Tinnitus on Soldiers Using Military Headsets – Candice Manning, AuD, PhD

1aPP44 – What’s That Noise? The Effect of Hearing Loss and Tinnitus on Soldiers Using Military Headsets – Candice Manning, AuD, PhD

What’s That Noise?  The Effect of Hearing Loss and Tinnitus on Soldiers Using Military Headsets

Candice Manning, AuD, PhD – Candice.Manning@va.gov

Timothy Mermagen, BS – timothy.j.mermagen.civ@mail.mil

Angelique Scharine, PhD – angelique.s.scharine.civ@mail.mil

Human and Intelligent Agent Integration Branch (HIAI)
Human Research and Engineering Directorate
U.S. Army Research Laboratory
Building 520
Aberdeen Proving Ground, MD

Lay language paper 1aPP44, “Speech recognition performance of listeners with normal hearing, sensorineural hearing loss, and sensorineural hearing loss and bothersome tinnitus when using air and bone conduction communication headsets”

Presented Monday Morning, May 23, 2016, 8:00 – 12:00, Salon E/F

171st ASA Meeting, Salt Lake City

Military personnel are at high risk for noise-induced hearing loss due to the unprecedented proportion of blast-related acoustic trauma experienced during deployment from high-level impulsive and continuous noise (i.e., transportation vehicles, weaponry, blast-exposure).  In fact, noise-induced hearing loss is the primary injury of United States Soldiers returning from Afghanistan and Iraq.  Ear injuries, including tympanic membrane perforation, hearing loss, and tinnitus, greatly affect a Soldier’s hearing acuity and, as a result, reduce situational awareness and readiness.  Hearing protection devices are accessible to military personnel; however, it has been noted that many troops forego the use of protection believing it may decrease circumstantial responsiveness during combat.

Noise-induced hearing loss is highly associated with tinnitus, the experience of perceiving sound that is not produced by a source outside of the body.  Chronic tinnitus causes functional impairment that may result in a tinnitus sufferer to seek help from an audiologist or other healthcare professional.  Intervention and management are the only options for those individuals suffering from chronic tinnitus as there is no cure for this condition.  Tinnitus affects every aspect of an individual’s life including sleep, daily tasks, relaxation, and conversation to name only a few.  In 2011, the United States Government Accountability Office report on noise indicated that tinnitus was the most prevalent service-connected disability.  The combination of noise-induced hearing loss and the perception of tinnitus could greatly impact a Soldier’s ability to rapidly and accurately process speech information under high-stress situations.

The prevalence of hearing loss and tinnitus within the military population suggests that Soldier use of hearing protection is extremely important. The addition of hearing protection into reliable communication devices will increase the probability of use among Soldiers.  Military communication devices using air and bone-conduction provide clear two-way audio communications through a headset and a microphone.

Air conduction headsets offer passive hearing protection from high ambient noise, and talk-through microphones allow the user to engage in face-to-face conversation and hear ambient environmental sounds, preserving situation awareness.  Bone-conduction technology utilizes the bone-conduction pathway and presents auditory information differently than air-conduction devices (see Figure 1).  Because headsets with bone conduction transducers do not cover the ears, they allow the user to hear the surrounding environment and the option to communicate over a radio network.  Worn with or without hearing protection, bone conduction devices are inconspicuous and fit easily under the helmet.   Bone conduction communication devices have been used in the past; however, as newer devices have been designed, they have not been widely adopted for military applications.








Figure 1. Air and Bone conduction headsets used during study: a) Invisio X5 dual in-ear headset and X50 control unit and b) Aftershockz Sports 2 headset.


Since many military personnel operate in high noise environments and with some degree of noise induced hearing damage and/or tinnitus, it is important to understand how speech recognition performance might be altered as a function of headset use.  This is an important subject to evaluate as there are two auditory pathways (i.e., air-conduction pathway and bone-conduction pathway) that are responsible for hearing perception.  Comparing the differences between the air and bone-conduction devices on different hearing populations will help to describe the overall effects of not only hearing loss, an extremely common disability within the military population, but the effect of tinnitus on situational awareness as well.  Additionally, if there are differences between the two types of headsets, this information will help to guide future communication device selection for each type of population (NH vs. SNHL vs. SNHL/Tinnitus).

Based on findings from speech understanding in noise literature, communication devices do have a negative effect on speech intelligibility within the military population when noise is present.  However, it is uncertain as to how hearing loss and/or tinnitus effects speech intelligibility and situational awareness under high-level noise environments.  This study looked at speech recognition of words presented over AC and BC headsets and measured three groups of listeners: Normal Hearing, sensorineural hearing impaired, and/or tinnitus sufferers. Three levels of speech-to-noise (SNR=0,-6,-12) were created by embedding speech items in pink noise.  Overall, performance was marginally, but significantly better for the Aftershockz bone conduction headset (Figure 2).  As would be expected, performance increases as the speech to noise ratio increases (Figure 3).

One of the most fascinating things about the data is that although the effect of hearing profile was significant, it was not practically so, the means for the Normal Hearing, Hearing Loss and Tinnitus groups were 65, 61, and 63, respectively (Figure 4).  Nor was there any interaction with any of the other variables under test.  One might conclude from the data that if the listener can control the level of presentation, the speech to noise ratio has about the same effect, regardless of hearing loss. There was no difference in performance with the TCAPS due to one’s hearing profile; however, the Aftershockz headset provided better speech intelligibility for all listeners.


Figure 2.  Mean rationalized arcsine units measured for each of the TCAPS under test.


Figure 3. Mean rationalized arcsine units measured as a function of speech to noise ratio.



Figure 4.  Mean rationalized arcsine units observed as a function of the hearing profile of the listener.

5aSCb17 – Pronunciation differences: Gender and ethnicity in Southern English – Wendy Herd, Devan Torrence, and Joy Cariño

5aSCb17 – Pronunciation differences: Gender and ethnicity in Southern English – Wendy Herd, Devan Torrence, and Joy Cariño

5aSCb17 – Pronunciation differences: Gender and ethnicity in Southern English – Wendy Herd, Devan Torrence, and Joy Cariño


Wendy Herd – wherd@english.msstate.edu

Devan Torrence – dct74@msstate.edu

Joy Carino – carinoj16@themsms.org


Linguistics Research Laboratory
English Department
Mississippi State University
Mississippi State, MS 39762


Popular version of paper 5aSCb17, “Prevoicing differences in Southern English: Gender and ethnicity effects”

Presented Friday morning, May 27, 10:05 – 12:00 in Salon F

171st ASA Meeting, Salt Lake City


We often notice differences in pronunciation between ourselves and other speakers. More noticeable differences, like the Southern drawl or the New York City pronunciation yuge instead of huge, are even used overtly when we guess where a given speaker is from. Our speech also varies in more subtle ways.

If you hold your hand in front of your mouth when saying tot and dot aloud, you will be able to feel a difference in the onset of vocal fold vibration. Tot begins with a sound that lacks vocal fold vibration, so a large rush of air can be felt on the hand at the beginning of the word. No such rush of air can be felt at the beginning of dot because it begins with a sound with vocal fold vibration. A similar difference can be felt when comparing [p] of pot to [b] of bot and [k] of cot to [ɡ] of got. This difference between [t] and [d] is very noticeable, but the timing of our vocal fold vibration also varies each time we pronounce a different version of [t] or [d].

Our study is particularly focused, not on the large difference between sounds like [t] and [d], but on how speakers produce the smaller differences between different [d] pronunciations. For example, an English [d] might be pronounced with no vocal fold vibration before the [d] as shown in Figure 1(a) or with vocal fold vibration before the [d] as shown in Figure 1(b). As can be heard in the accompanying sound files, the difference between these two [d] pronunciations is less noticeable for English speakers than the difference between [t] and [d].

Figure 1. Spectrogram of (a) dot with no vocal fold vibration before [d] and (b) dot with vocal fold vibration before [d]. (Only the first half of dot is shown.)

We compared the pronunciations of 40 native speakers of English from Mississippi to see if some speakers were more likely to vibrate their vocal folds before [b, d, ɡ] rather than shortly after those sounds. These speakers included equal numbers of African American participants (10 women, 10 men) and Caucasian American participants (10 women, 10 men).


Previous research found that men were more likely to vibrate their vocal folds before [b, d, ɡ] than women, but we found no such gender differences [1]. Men and women from Mississippi employed vocal fold vibration similarly. Instead, we found a clear effect of ethnicity. African American participants produced vocal fold vibration before initial [b, d, ɡ] 87% of the time while Caucasian American participants produced vocal fold vibration before these sounds just 37% of the time. This striking difference, which can be seen in Figure 2, is consistent with a previous smaller study that found ethnicity effects in vocal fold vibration among young adults from Florida [1, 2]. It is also consistent with descriptions of regional variation in vocal fold vibration [3].


Figure 2. Percentage of pronunciations produced with vocal fold vibration before [b, d, ɡ] displayed by ethnicity and gender.

The results suggest that these pronunciation differences are due to dialect variation. African American speakers from Mississippi appear to systematically use vocal fold vibration before [b, d, ɡ] to differentiate them from [p, t, k], but the Caucasian American speakers are using the cue differently and less frequently. Future research in the perception of these sounds could shed light on how speakers of different dialects vary in the way they interpret this cue. For example, if African American speakers are using this cue to differentiate [d] from [t], but Caucasian American speakers are using the same cue to add emphasis or to convey emotion, it is possible that listeners sometimes use these cues to (mis)interpret the speech of others without ever realizing it. We are currently attempting to replicate these results in other regions.

Each accompanying sound file contains two repetitions of the same word. The first repetition does not include fold vibration before the initial sound, and the second repetition does include vocal fold vibration before the initial sound.

  1. Ryalls, J., Zipprer, A., & Baldauff, P. (1997). A preliminary investigation of the effects of gender and race on voice onset time. Journal of Speech Language and Hearing, 40(3), 642-645.
  2. Ryalls, J., Simon, M., & Thomason, J. (2004). Voice onset time production in older Caucasian- and African-Americans. Journal of Multilingual Communication Disorders, 2(1), 61-67.
  3. Jacewicz, E., Fox, R.A., & Lyle, S. (2009). Variation in stop consonant voicing in two regional varieties of American English. Language Variation and Change, 39(3), 313-334.



3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount? – Eric M. Johnson, Sarah Hargus Ferguson

3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount? – Eric M. Johnson, Sarah Hargus Ferguson

Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount?


Eric M. Johnson – eric.martin.johnson@utah.edu

Sarah Hargus Ferguson – sarah.ferguson@hsc.utah.edu

Department of Communication Sciences and Disorders
University of Utah
390 South 1530 East, Room 1201
Salt Lake City, UT 84112


Popular version of poster 3pSC10, “Gender and rate effects on speech intelligibility.”

Presented Wednesday afternoon, May 25, 2016, 1:00, Salon G

171st ASA Meeting, Salt Lake City

Older adults seeking hearing help often report having an especially hard time understanding women’s voices. However, this anecdotal observation doesn’t always agree with the findings from scientific studies. For example, Ferguson (2012) found that male and female talkers were equally intelligible for older adults with hearing loss. Moreover, several studies have found that young people with normal hearing actually understand women’s voices better than men’s voices (e.g. Bradlow et al., 1996; Ferguson, 2004). In contrast, Larsby et al. (2015) found that, when listening in background noise, groups of listeners with and without hearing loss were better at understanding a man’s voice than a woman’s voice. The Larsby et al. data suggest that female speech might be more affected by distortion like background noise than male speech is, which could explain why women’s voices may be harder to understand for some people.

We were interested to see if another type of distortion, speeding up the speech, would have an equal effect on the intelligibility of men and women. Speech that has been sped up (or time-compressed) has been shown to be less intelligible than unprocessed speech (e.g. Gordon-Salant & Friedman, 2011), but no studies have explored whether time compression causes an equal loss of intelligibility for male and female talkers. If an increase in playback speed causes women’s speech to be less intelligible than men’s, it could reveal another possible reason why so many older adults with hearing loss report difficulty understanding women’s voices. To this end, our study tested whether the intelligibility of time-compressed speech decreases for female talkers more than it does for male talkers.

Using 32 listeners with normal hearing, we measured how much the intelligibility of two men and two women went down when the playback speed of their speech was increased by 50%. These four talkers were selected based on their nearly equivalent conversational speaking rates. We used digital recordings of each talker and made two different versions of each sentence they spoke: a normal-speed version and a fast version. The software we used allowed us to speed up the recordings without making them sound high-pitched.

Audio sample 1: A sentence at its original speed.

Audio sample 2: The same sentence sped up to 50% faster than its original speed.


All of the sentences were presented to the listeners in background noise. We found that the men and women were essentially equally intelligible when listeners heard the sentences at their original speed. Speeding up the sentences made all of the talkers harder to understand, but the effect was much greater for the female talkers than the male talkers. In other words, there was a significant interaction between talker gender and playback speed. The results suggest that time-compression has a greater negative effect on the intelligibility of female speech than it does on male speech.

johnson & ferguson fig 1

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.


These results confirm the negative effects of time-compression on speech intelligibility and imply that audiologists should counsel the communication partners of their patients to avoid speaking excessively fast, especially if the patient complains of difficulty understanding women’s voices. This counsel may be even more important for the communication partners of patients who experience particular difficulty understanding speech in noise.


  1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20, 255-272.
  2. Ferguson, S. H. (2004). “Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365-2373.
  3. Ferguson, S. H. (2012). “Talker differences in clear and conversational speech: Vowel intelligibility for older adults with hearing loss,” J. Speech Lang. Hear. Res. 55, 779-790.
  4. Gordon-Salant, S., and Friedman, S. A. (2011). “Recognition of rapid speech by blind and sighted older adults,” J. Speech Lang. Hear. Res. 54, 622-631.
  5. Larsby, B., Hällgren, M., Nilsson, L., and McAllister, A. (2015). “The influence of female versus male speakers’ voice on speech recognition thresholds in noise: Effects of low-and high-frequency hearing impairment,” Speech Lang. Hear. 18, 83-90.



3aBA1 – Ultrasound-Mediated Drug Targeting to Tumors: Revision of Paradigms Through Intravital Imaging – Natalya Rapoport

3aBA1 – Ultrasound-Mediated Drug Targeting to Tumors: Revision of Paradigms Through Intravital Imaging – Natalya Rapoport

Ultrasound-Mediated Drug Targeting to Tumors: Revision of Paradigms Through Intravital Imaging


Natalya Rapoport – natasha.rapoport@utah.edu

Department of Bioengineering
University of Utah
36 S. Wasatch Dr., Room 3100
Salt Lake City, Utah 84112

Popular version of paper 3aBA1, “Ultrasound-mediated drug targeting to tumors: Revision of paradigms through intravital imaging”

Presented Wednesday morning, May 25, 2016, 8:15 AM in Salon H

171st ASA Meeting, Salt Lake City


More than a century ago, Nobel Prize laureate Paul Ehrlich formulated the idea of a “magic bullet”. This is a virtual drug that hits its target while bypassing healthy tissues. No field of medicine could benefit more from the development of a “magic bullet” than cancer chemotherapy, which is complicated by severe side effects. For decades, the prospects of developing “magic bullets” remained elusive. During the last decade, progress in nanomedicine has enabled tumor-targeted delivery of anticancer drugs via their encapsulation in tiny carriers called nanoparticles. Nanoparticle tumor targeting is based on the “Achilles’ heels” of cancerous tumors – their poorly organized and leaky microvasculature. Due to their size, nanoparticles are not capable to penetrate through a tight healthy tissue vasculature. In contrast, nanoparticles penetrate through a leaky tumor microvasculature thus providing for localized accumulation in tumor tissue.  After tumor accumulation of drug-loaded nanoparticles, a drug should be released from the carrier to allow penetration into a site of action (usually located in a cell cytoplasm or nucleus). A local release of an encapsulated drug may be triggered by tumor-directed ultrasound; application of ultrasound has additional benefits: ultrasound enhances nanoparticle penetration through blood vessel walls (extravasation) as well as drug uptake (internalization) by tumor cells.

For decades, ultrasound has been used only as an imaging modality; the development of microbubbles as ultrasound contrast agents in early 2000s has revolutionized imaging. Recently, microbubbles have attracted attention as drug carriers and enhancers of drug and gene delivery. Microbubbles could have been ideal carriers for the ultrasound-mediated delivery of anticancer drugs.  Unfortunately, their micron-scale size does not allow effective extravasation from the tumor microvasculature into tumor tissue. In Dr. Rapoport’s lab, this problem has been solved by the development of nanoscale microbubble precursors, namely drug-loaded nanodroplets that converted into microbubbles under the action of ultrasound[1-6]. Nanodroplets comprised a liquid core formed by a perfluorocarbon compound and a two-layered drug-containing polymeric shell (Figure 1.Schematic representation of a drug-loaded nanodroplet). An aqueous dispersion of nanodroplets is called nanoemulsion.

A suggested mechanism of therapeutic action of drug-loaded perfluorocarbon nanoemulsions is discussed below [3, 5, 6]. A nanoscale size of droplets (ca. 250 nm) provides for their extravasation into a tumor tissue while bypassing normal tissues, which is a basis of tumor targeting. Upon nanodroplet tumor accumulation, tumor-directed ultrasound triggers nanodroplet conversion into microbubbles, which in turn triggers release of a nanodroplet-encapsulated drug.  This is because in the process of the droplet-to-bubble conversion, particle volume increases about a hundred-fold, with a related decrease of a shell thickness. Microbubbles oscillate in the ultrasound field, resulting in a drug “ripping” off a thin microbubble shell (Figure 2. Schematic representation of the mechanism of drug release from perfluorocarbon nanodroplets triggered by ultrasound-induced droplet-to-bubble conversion; PFC – perfluorocarbon). In addition, oscillating microbubbles enhance internalization of released drug by tumor cells.

This tumor treatment modality has been tested in mice bearing breast, ovarian, or pancreatic cancerous tumors and has been proved very effective. Dramatic tumor regression and sometimes complete resolution was observed when optimal nanodroplet composition and ultrasound parameters were applied

Rapoport 3A


Rapoport 3B


Rapoport 3C


(Figure 3. A – Photographs of a mouse bearing a subcutaneously grown breast cancer tumor xenograft treated by four systemic injections of the nanodroplet-encapsulated anticancer drug paclitaxel (PTX) at a dose of 40 mg/kg as PTX. B – Photographs of a mouse bearing two ovarian carcinoma tumors (a) – immediately before and (b) – three weeks after the end of treatment; mouse was treated by four systemic injections of the nanodroplet-encapsulated PTX at a dose of 20 mg/kg as PTX; only the right tumor was sonicated. C – Photographs (a, c) and fluorescence images (b, d) of a mouse bearing fluorescent pancreatic tumor taken before (a, b) and three weeks after the one-time treatment with PTX-loaded nanodroplets at a dose of 40 mg/kg as PTX (c,d). The tumor was completely resolved and never recurred) [3, 4, 6].

In the current presentation, the proposed mechanism of a therapeutic action of drug-loaded, ultrasound-activated perfluorocarbon nanoemulsions has been tested using intravital laser fluorescence microscopy performed in collaboration with Dr. Brian O’Neill (then with Houston Methodist Research Institute, Houston, Texas) [2]. Fluorescently labeled nanocarrier particles (or a fluorescently labeled drug) were systemically injected though the tail vein to anesthetized live mice bearing subcutaneously grown pancreatic tumors. Nanocarrier and drug arrival and extravasation in the region of interest (i.e. normal or tumor tissue) were quantitatively monitored. Various drug nanocarriers in the following size hierarchy were tested: individual polymeric molecules; tiny micelles formed by a self-assembly of these molecules; nanodroplets formed from micelles. The results obtained confirmed the mechanism discussed above.

  • As expected, dramatic differences in the extravasation rates of nanoparticles were observed.
  • The extravsation of individual polymer molecules was extremely fast even in the normal (thigh muscle) tissue; In contrast, the extravasation of nanodroplets into the normal tissue was very slow. (Figure 4. A – Bright field image of the adipose and thigh muscle tissue. B,C – extravasation of individual molecules (B – 0 min; C – 10 min after injection); vasculature lost fluorescence while tissue fluorescence increased. D,E – extravasation of nanodroplets; blood vessel fluorescence was retained for an hour of observation (D – 30 min; E – 60 min after injection).
  • Nanodroplet extravasation into the tumor tissue was substantially faster than that into the normal tissue thus providing for effective nanodroplet tumor targeting.
  • Tumor-directed ultrasound significantly enhanced extravasation and tumor accumulation of both, micelles and nanodroplets (Figure 5. Effect of ultrasound on the extravasation of Fluorescence of blood vessels dropped while that of the tumor tissue increased after ultrasound). Also, pay attention to a very irregular tumor microvasculature, to be compared with that of a normal tissue shown in Figure 4.
  • The ultrasound effect on nanodroplets was 3-fold stronger than that on micelles thus making nanodroplets a better drug carriers for ultrasound-mediated drug delivery.
  • On a negative side, some premature drug release into the circulation that preceded tumor accumulation was observed. This proposes directions for a further improvement of nanoemulsion formulations.
Rapoport 1



Rapoport 2



Rapoport 5


4pMU4 – How Well Can a Human Mimic the Sound of a Trumpet? -Ingo R. Titze

4pMU4 – How Well Can a Human Mimic the Sound of a Trumpet? -Ingo R. Titze

How Well Can a Human Mimic the Sound of a Trumpet?

Ingo R. Titze – ingo.titze@utah.edu

University of Utah
201 Presidents Cir
Salt Lake City, UT

Popular version of paper 4pMU4 “How well can a human mimic the sound of a trumpet?”

Presented Thursday May 26, 2:00 pm, Solitude room

171st ASA Meeting Salt Lake City


Man-made musical instruments are sometimes designed or played to mimic the human voice, and likewise vocalists try to mimic the sounds of man-made instruments.  If flutes and strings accompany a singer, a “brassy” voice is likely to produce mismatches in timbre (tone color or sound quality).  Likewise, a “fluty” voice may not be ideal for a brass accompaniment.  Thus, singers are looking for ways to color their voice with variable timbre.

Acoustically, brass instruments are close cousins of the human voice.  It was discovered prehistorically that sending sound over long distances (to locate, be located, or warn of danger) is made easier when a vibrating sound source is connected to a horn.  It is not known which came first – blowing hollow animal horns or sea shells with pursed and vibrating lips, or cupping the hands to extend the airway for vocalization. In both cases, however, airflow-induced vibration of soft tissue (vocal folds or lips) is enhanced by a tube that resonates the frequencies and radiates them (sends them out) to the listener.

Around 1840, theatrical singing by males went through a revolution.  Men wanted to portray more masculinity and raw emotion with vocal timbre. “Do di Petto”, which is Italien for “C  in chest voice” was introduced by operatic tenor Gilbert Duprez in 1837, which soon became a phenomenon.  A heroic voice in opera took on more of a brass-like quality than a flute-like quality.  Similarly, in the early to mid- twentieth century (1920-1950), female singers were driven by the desire to sing with a richer timbre, one that matched brass and percussion instruments rather than strings or flutes.  Ethel Merman became an icon in this revolution. This led to the theatre belt sound produced by females today, which has much in common with a trumpet sound.


Fig.1.  Mouth opening to head-size ratio for Ethel Merman and corresponding frequency spectrum for the sound “aw” with a fundamental frequency fo (pitch) at 547 Hz and a second harmonic frequency 2 fo at 1094 Hz.


The length of an uncoiled trumpet horn is about 2 meters (including the full length of the valves), whereas the length of a human airway above the glottis (the space between the vocal cords) is only about 17 cm (Fig. 2). The vibrating lips and the vibrating vocal cords can produce similar pitch ranges, but the resonators have vastly different natural frequencies due to the more than 10:1 ratio in airway length.  So, we ask, how can the voice produce a brass-like timbre in a “call” or “belt”?

One structural similarity between the human instrument and the brass instrument is the shape of the airway directly above the glottis, a short and narrow tube formed by the epiglottis.  It corresponds to the mouthpiece of brass instruments.  This mouthpiece plays a major role in shaping the sound quality.  A second structural similarity is created when a singer uses a wide mouth opening, simulating the bell of the trumpet.  With these two structural similarities, the spectrum of tones produced by the two instruments can be quite similar, despite the huge difference in the overall length of the instrument.



Fig 2.  Human airway and trumpet (not drawn to scale).


Acoustically, the call or belt-like quality is achieved by strengthening the second harmonic frequency 2fin relation to the fundamental frequency fo.  In the human instrument, this can be done by choosing a bright vowel like /ᴂ/ that puts an airway resonance near the second harmonic.  The fundamental frequency will then have significantly less energy than the second harmonic.

Why does that resonance adjustment produce a brass-like timbre?  To understand this, we first recognize that, in brass-instrument playing, the tones produced by the lips are entrained (synchronized) to the resonance frequencies of the tube.  Thus, the tones heard from the trumpet are the resonance tones. These resonance tones form a harmonic series, but the fundamental tone in this series is missing.  It is known as the pedal tone.  Thus, by design, the trumpet has a strong second harmonic frequency with a missing fundamental frequency.

Perceptually, an imaginary fundamental frequency may be produced by our auditory system when a series of higher harmonics (equally spaced overtones) is heard.  Thus, the fundamental (pedal tone) may be perceptually present to some degree, but the highly dominant second harmonic determines the note that is played.

In belting and loud calling, the fundamental is not eliminated, but suppressed relative to the second harmonic.  The timbre of belt is related to the timbre of a trumpet due to this lack of energy in the fundamental frequency.  There is a limit, however, in how high the pitch can be raised with this timbre.  As pitch goes up, the first resonance of the airway has to be raised higher and higher to maintain the strong second harmonic.  This requires ever more mouth opening, literally creating a trumpet bell (Fig. 3).


Fig 3. Mouth opening to head-size ratio for Idina Menzel and corresponding frequency spectrum for a belt sound with a fundamental frequency (pitch) at 545 Hz.


Note the strong second harmonic frequency 2fo in the spectrum of frequencies produced by Idina Menzel, a current musical theatre singer.

One final comment about the perceived pitch of a belt sound is in order.  Pitch perception is not only related to the fundamental frequency, but the entire spectrum of frequencies.  The strong second harmonic influences pitch perception. The belt timbre on a D5 (587 Hz) results in a higher pitch perception for most people than a classical soprano sound on the same note. This adds to the excitement of the sound.