ASA Lay Language Papers
162nd Acoustical Society of America Meeting


The human vocal instrument: Visualizing tongue shaping in speech sound production

Daniel Bone – dbone@usc.edu
Michael Proctor – mproctor@usc.edu
Yoon Kim – yoonckim@usc.edu
Shrikanth Narayanan – shri@sipi.usc.edu
Viterbi School of Engineering,
University of Southern California,
Los Angeles, CA 90089, USA.
http://sail.usc.edu/
Popular version of paper 4pSCb7
Presented Thursday, 03 November, 2011
162nd Meeting of the Acoustical Society of America, San Diego, Calif.

We effortlessly move our vocal organs to produce speech at rates in excess of 100 words per minute, without consciously thinking about how we do it. Many of the sounds we produce – such as the consonants at the beginning of the English words sip and ship – involve very fine differences in the way we place our tongue tip on the palate, shape the tongue body in the middle of the mouth, and coordinate these movements with the lips and the airstream. Because speech movements occur very rapidly, and because of the difficulty of reliably tracking the location of the vocal organs inside the mouth, it has proven difficult to study the way that speakers articulate the tongue in sufficient detail to understand the basis of the contrast between certain sounds. These difficulties increase when speech is sung, or when the ability to produce speech is compromised due to disease or disability.

Magnetic Resonance Imaging (MRI) has revolutionized the study of speech. Different variants of the technique offer different insights into how speech is produced, in both space and time.  Structural MRI allows us to view part of a speaker’s tongue while they produce sustained sounds (Baer et al.,  1991; Alwan et al. 1997), and recent innovations in fast imaging have produced movies of the tongue, allowing us to observe and measure the ‘vocal dance’ during speech in real time (Narayanan et al. 2004; http://sail.usc.edu/span/). However, MRI typically only provides a two-dimensional view through the mouth, which is not sufficient to properly understand the intricate contrast between the vowels in the words beat and bit, for example, or the differences between the consonants at the beginning of the words leap and reap – sounds which all involve changes in the three-dimensional shaping of the tongue.

To provide more insights into these speech contrasts, we have developed a method for automatically producing high resolution three-dimensional models of tongue surfaces from MRI data. Speakers of American English spoke words containing different vowels and consonants while lying in an MRI scanner. A special technique was used to image a full three-dimensional volume of their head – including the tongue, jaw, throat, lips and nasal cavity – while they sustained the sound of interest for 6 seconds. For each volume, a point in the middle of the vocal tract airway was selected, and a surface was automatically fit to the soft tissue in the mouth, using a region-growing algorithm. Using knowledge about the location of each speaker’s palate, jaw and teeth, the section of the smoothed surface corresponding to the tongue was extracted from each volume.

The result was a set of three-dimensional volumes showing the differences in tongue shaping which are used to contrast different vowels and consonants in English. These models offer several important insights into tongue geometry which cannot be obtained from two-dimensional MRI images, such as the way the sides of the tongue are raised or lowered, and the depth and shape of the groove in the middle of the tongue – insights which are important for enhancing our understanding of the ways that speakers of different languages contrast different sounds.


Above: MRI slice through the middle of the head (viewed through the speaker’s left check), showing the shape of the tongue in the vowel in the word bat.

Three-dimensional views of a speaker’s vocal tract showing the production of the vowel in the word bat. Left: front of tongue seen through open lips.   Center: top view of the vocal tract (looking down through the top of the head).   Right: View through front left check of the cavity formed between the top of the tongue and the roof of the mouth.

(Research supported by the National Institutes of Health).

[ Lay Language Papers Index | Press Room