Brad Story – bstory@email.arizona.edu
Dept. of Speech, Language, and Hearing Sciences
University of Arizona
P.O. Box 210071
Tucson, AZ 85712
Popular version of paper 4pAAa10
Presented Thursday afternoon, October 30, 2014
168th ASA Meeting, Indianapolis
The human voice is a pattern of sound generated by both the mind and body, and carries with it information about about a speaker’s mental and physical state. Qualities such as gender, age, physique, dialect, health, and emotion are often embedded in the voice, and can produce sounds that are comforting and pleasant, intense and urgent, sad and happy, and so on. The human voice can also project a sense of eeriness when the sound contains qualities that are human-like, but not necessarily typical of the speech that is heard on a daily basis. A person with an unusually large head and neck, for example, may produce highly intelligible speech, but it will be oddly dominated by low frequency sounds that belie the atypical size of the talker. Excessively slow or fast speaking rates, strangely-timed and irregular speech, as well as breathiness and tremor may all also contribute to an eeriness if produced outside the boundaries of typical speech.
The sound pattern of the human voice is produced by the respiratory system, the larynx, and the vocal tract. The larynx, located at the bottom of the throat, is comprised of a left and right vocal fold (often referred to as vocal cords) and a surrounding framework of cartilage and muscle. During breathing the vocal folds are spread far apart to allow for an easy flow of air to and from the lungs. To generate sound they are brought together firmly, allowing air pressure to build up below them. This forces the vocal folds into vibration, creating the sound waves that are the “raw material” to be formed into speech by the vocal tract. The length and mass of the vocal folds largely determine the vocal pitch and vocal quality. Small and light vocal folds will generally produce a high pitched sound, whereas low pitch typically originate with large, heavy vocal folds.
The vocal tract is the airspace created by the throat and the mouth whose shape at any instant of time depends on the positions of the tongue, jaw, lips, velum, and larynx. During speech it is a continuously changing tube-like structure that “sculpts” the raw sound produced by the vocal folds into a stream of vowels and consonants. The size and shape of the vocal tract imposes another layer of information about the talker. A long throat and large mouth may transmit the impression of a large body while more subtle characteristics like the contour of the roof of the mouth may add characteristics that are unique to the talker.
For this study, speech was simulated with a mathematical representation of the vocal folds and vocal tract. Such simulations allow for modifications of size and shape of structures, as well as temporal aspects of speech. The goal was to simulate extremes in vocal tract length, unusual timing patterns of speech movements, and odd combinations of breathiness and tremor. The result can be both eerie and amusing because the sounds produced are almost human, but not quite.
Three examples are included to demonstrate these effects. The first is set of seven simulations of the word “abracadabra” produced while gradually decreasing the vocal tract length from 22 cm to 6.6 cm, increasing the vocal pitch from very low to very high, and increasing the speaking rate from slow to fast. The longest and shortest vocal tracts are shown in Figure 1 and are both configured as “ah” vowels; for production of the entire word, the vocal tract shape continuously changes. The set of simulations can be heard in sound sample 1.
Although it may be tempting to assume that the changes present in sound sample 1 are similar to simply increasing the playback speed of the audio, the changes are based on physiological scaling of the vocal tract, vocal folds, as well as an increase in the speaking rate. Sound sample 2 contains the same seven simulations except that the speaking rate is exactly the same in each case, eliminating the sense of increased playback speed.
The third example demonstrates the effects of modifying the timing of the vowels and consonants within the word “abracadabra” while simultaneously adding a shaky or tremor-like quality, and an increased amount of breathiness. A series of six simulations can be heard in sound sample 3; the first three versions of the word are based on the structure of an unusually large male talker, whereas the second three are representative of an adult female talker.
This simulation model used for these demonstrations has been developed for purposes of studying and understanding human speech production and speech development. Using the model to investigate extreme cases of structure and unusual timing patterns is useful for better understanding the limits of human speech.
Figure 1 caption:
Unnaturally long and short tube-like representations of the human vocal tract. Each vocal tract is configured as an “ah” vowel (as in “hot”), but during speech the vocal tract continuously changes shape. Vocal tract lengths for typical adult male and adult female talkers are approximately 17.5 cm and 15 cm, respectively. Thus, the 22 cm long tract would be representative of a person with an unusually large head and neck, whereas the 6.6 cm vocal tract is even shorter than a typical infant.