Colette Feehan –
Indiana University

Popular version of paper 2pSCb4
Presented Tuesday afternoon, December 8, 2020
179th ASA meeting, Virtually Everywhere
Click here to read the abstract

Many people do not realize that the “children” they hear in animation are actually voiced by adults1. There are several reasons for this, including: children cannot work long hours, are difficult to direct, and their voices change as they grow. Using an adult who can simulate a child voice bypasses these issues, but surprisingly not all voice actors (VAs) can create a believable child voice.

Studying what VAs do can tell us about how the vocal tract works. They can speak intelligibly while contorting their mouths in unnatural ways. A few previous studies2-10 have looked at the acoustics of VAs, or just the sounds that they produce, such as changes in pitch, voice quality (how raspy or breathy a voice sounds), and what kinds of regional dialects they use. This study uses 3D ultrasound and acoustic data from 3 professional and 3 amateur VAs to start answering the question: What do voice actors do with their vocal tracts to sound like a child? There are multiple different strategies to make your vocal tract sound smaller and different actors combine different strategies to make their child-like voices.

Looking at both the acoustics (the sounds they produce) and the ultrasound imaging of their vocal tracts, the strategies identified so far include: Gesture fronting and raising and hyoid bone raising.

Gesture fronting and raising refers to the position of the tongue within the mouth while you speak. If you think about the location of your tongue when repeating “ta ka ta ka…” you will notice that your tongue touches the roof of your mouth in different places to make each of those consonant sounds—farther forward in the mouth for “ta” and farther back for “ka” and the same is true for vowels. Figure 1 comes from analyzing the recording of their speech and shows that the position of the tongue for the adult versus child voice is pretty different for [i] and [ɑ] sounds for this subject. Given this information, we can then look at the ultrasound and see that the tongue positions are indeed farther forward (right) or higher in the mouth for the child voice, see Figure 2

The hyoid bone is a small bone above the larynx in your neck. This bone interrupts the ultrasound signal and prevents an image from showing up, but looking at the location of this hyoid “shadow” can still give us information. If the hyoid shadow is raised and fronted, as seen in Figure 3, it might be the case that the actor is shortening their vocal tract by contracting muscles in their throat.

Figure 4 shows that, for this VA, the hyoid bone shadow was higher throughout the entire utterance while doing a child voice, meaning that the actor might physically shorten the whole vocal tract the whole time while speaking

Data from VAs can help find alternative pronunciations for speech sounds which could help people with speech impediments but could also be used to help trans individuals sound closer to their identity.


  1. Holliday, C. “Emotion Capture: Vocal Performances by Children in the Computer-Animated Film”. Alphaville: Journal of Film and Screen Media 3 (Summer 2012). Web. ISSN: 2009-4078.
  2. Starr, R. L. (2015). Sweet voice: The role of voice quality in a Japanese feminine style. Language in Society, 44(01), 1-34.
  3. Teshigawara, M. (2003). Voices in Japanese animation: a phonetic study of vocal stereotypes of heroes and villains in Japanese culture. Dissertation.
  4. Teshigawara, M. (2004). Vocally expressed emotions and stereotypes in Japanese animation: Voice qualities of the bad guys compared to those of the good guys. Journal of the Phonetic Society of Japan8(1), 60-76.
  5. Teshigawara, M., & Murano, E. Z. (2004). Articulatory correlates of voice qualities of good guys and bad guys in Japanese anime: An MRI study. In Proceedings of INTERSPEECH (pp. 1249-1252).
  6. Teshigawara, M., Amir, N., Amir, O., Wlosko, E., & Avivi, M. (2007). Effects of random splicing on listeners’ perceptions. In 16th international congress of phonetic sciences (icphs).
  7. Teshigawara, M. 2009. Vocal expressions of emotions and personalities in Japanese anime. In Izdebski, K. (ed.), Emotions of the Human Voice, Vol. III Culture and Perception. San Diego: Plural Publishing, 275-287.
  8. Teshigawara, K. (2011). Voice-based person perception: two dimensions and their phonetic properties. ICPhSXVII, 1974-1977.
  9. Uchida, T. 2007. Effects of F0 range and contours in speech upon the image of speakers’ personality. Proc.19th ICA Madrid.
  10. Lippi-Green, R. (2011). English with an accent : language, ideology and discrimination in the united states. Retrieved from
Share This