ASA Lay Language Papers
162nd Acoustical Society of America Meeting

Perception of high-frequency sounds in singing and speech: Studying singing to learn about speech

Brian B. Monson –
National Center for Voice and Speech
University of Utah
136 S. Main Street
Ste 320
Salt Lake City, UT 84101

A. Davi Vitela –
Brad H. Story –
Andrew J. Lotto –
Department of Speech, Language, and Hearing Science
University of Arizona
PO Box 210071
Tucson, AZ 85721

Popular version of paper 5aSCb3
Presented Friday morning Nov. 4, 2011
162nd ASA Meeting, San Diego, Calif.

Ever wondered why people sound different on your cell phone than in person? You may already know that the answer is because your cell phone doesn’t transmit all of the sounds that the human voice creates. The voice can and normally does make sounds at high frequencies in the “treble” audio range (up to 20,000 Hz and higher) in the form of overtones and noise from consonants, but your cell phone cuts off anything above about 3400 Hz, leaving the rest for your brain to “fill in” and figure out. What are you missing out on? That’s the subject of this study.

Figure 1. Spectrogram showing acoustical energy (sound) created by a human voice up to 20,000 Hz. The current cell phone bandwidth (shaded blue) only transmits sounds between about 300 and 3400 Hz. High-frequency energy above 5000 Hz (shaded red) has information potentially useful to the brain when perceiving singing and speech.

While normal human vocal sounds contain energy up to 20,000 Hz and normal-hearing listeners can hear up to this range, anything above about 5000 Hz has historically been ignored in science. It has traditionally been believed that this high-frequency energy is perceptually insignificant for speech, which has led to communication devices, like cell phones and hearing aids, neglecting the high frequencies, too. Only recently has this practice been called into question.

So why study singing? There were some clues that indicated the high frequencies would be more important for singing than for speech. For example, if you crank up the “treble” on your radio or the equalizer (EQ) on your music player the sound becomes very “bright” or “shrill”. If you turn it down the sound becomes “dull” or “muffled”. These are qualitative descriptors, and sound engineers have known for years the importance of putting the right amount of “treble” in the EQ of vocals for a musical performance so that audiences will like the quality of the sound. Through studying singing, we found that people can be very sensitive to treble EQ changes in singing voices (Monson et al., 2011). Singing also has more high-frequency energy than normal speech (Monson, 2011), which adds to the argument that high frequencies could mostly affect qualitative percepts since quality is much more important in perception of singing than speech (where understanding the message tends to be more important). The thought is that perhaps some of the best singers’ voices naturally have an optimal amount of high frequencies (in the form of overtones) that the listener’s ear likes.

While at first it was thought that any perceptual information gleaned from the high frequencies would have to do with the quality of singing and speech, it is emerging from recent research that the high frequencies have the potential to play a much more significant perceptual role. In fact, we have found that when people are given only high-frequency energy from singing and speech (i.e., nothing that you get on your cell phone) they are able to identify the gender of the singer/talker, they can tell whether it is singing or speech, and they can even recognize the song and figure out what is being said. People can still perform these tasks very well when there is a lot of noise in the lower frequencies that could potentially cover up this information. These results are contrary to assumptions that have been held for years in speech and voice science, and have been discovered through scientific study of the singing voice.

Figure 2. Frequency spectrum showing the frequency content of HFE with speech-shaped low-frequency noise.

Want to give it a try? See if you can figure out the gender and even recognize the song or speech in these clips that have only high-frequency sounds (above 5700 Hz) and low-frequency noise (you may need a good quality loudspeaker to do it):

example1 example2 example3 example4

(Hint: If you didn’t figure out the song, take a look at Figure 1.)

What does all of this mean for you? Not only could it account for a good musical performance from your favorite singer, but it could also explain why you have a hard time understanding someone on your phone, especially when sitting on a noisy train or at a cocktail party. It is possible that transmitting high frequencies on cell phones (or hearing aids) could give you the information you need to understand better in these situations. Or perhaps you have noticed that sound quality of Internet audio (e.g., Skype) is much better than your phone. That is in part because it transmits sounds up to about 7000 Hz. But what if it went even higher than that? As it turns out, the improvement you experience could be more than just quality.


Monson, B. B., Lotto, A. J., and Ternström, S. (2011). “Detection of high-frequency energy changes in sustained vowels produced by singers,” Journal of the Acoustical Society of America, 129, 2263-2268.

Monson, B. B. (2011). High-Frequency Energy in Singing and Speech. Doctoral dissertation submitted to University of Arizona.

[ Lay Language Papers Index | Press Room