Natalie Fecher – natalie.fecher@utoronto.ca
Angela Cooper – angela.cooper@utoronto.ca
Elizabeth K. Johnson – elizabeth.johnson@utoronto.ca

University of Toronto
3359 Mississauga Rd.,
Mississauga, Ontario L5G 4K2 CANADA

Popular version of paper 2pSC34
Presented Tuesday afternoon, November 6, 2018, 2:00-5:00 PM, UPPER PAVILION (VCC)
176th ASA Meeting, Victoria, Canada

Parents will tell you that a two-year-old’s birthday party is a chaotic place—young children running around, parents calling out to their children. Amidst that chaos, if you heard a young child calling out, asking to go to the bathroom, would you be able to recognize who’s talking without seeing their face? Perhaps not easily as you might expect, suggests new research from the University of Toronto.

Adults are very adept at recognizing other adults from only their speech. However, children’s speech productions differ substantially from adults, arising from differences in the size of their vocal tracts, to how well they can control their articulators (e.g., tongue) to form speech sounds, to differences in their linguistic knowledge. As a result, a child may pronounce words like elephant and strawberry more like “ephant” and “dobby”. We know very little about how these differences in child and adult speech might affect our ability to recognize who’s talking. Previous work from our lab demonstrated that even mothers are surprisingly not as accurate as you might expect at identifying their own child’s voice.

Sample of 4 adult voices 

4 child voices producing the word ‘elephant’

In this study, we used two tasks to shed light on differences between child and adult voice recognition. First, we presented adult listeners with pairs of either child or adult voices to determine if they could even tell them apart. Results revealed that listeners were substantially worse at differentiating child voices relative to adult voices.

The second task had new adult listeners complete a two-day voice learning experiment, where they were trained to identify a set of 4 child voices on one day and 4 adult voices on the other day. Listeners first heard each voice producing a set of words while seeing a cartoon image on the screen, so they could learn the association between the cartoon and voice. During training, they heard a word and saw a pair of cartoon images, after which, they selected who they thought was speaking and received feedback on their accuracy. Finally, at test, they heard a word and saw 4 cartoon images on the screen and selected who they thought was speaking (Figure 1).

Children’s voices

Figure 1. Paradigm for the voice learning task

Results showed that with training, listeners can learn to identify children’s voices above chance, though child voice learning was still slower and less accurate than adult voice learning. Interestingly, no relationship was found between a listeners’ voice learning performance with adult voices and their voice learning performance with child voices, such that those who were relatively good at identifying adult voices were not necessarily also good at identifying child voices.

This may suggest that the information in the speech signal that we use to differentiate adult voices may not be as informative for identifying child voices. Successful child voice recognition may require re-tuning our perceptual system to pay attention to different cues. For example, it may be more helpful to attend to the fact that one child makes certain pronunciation errors, while another child makes a different set of pronunciation errors.

Share This