ASA Lay Language Papers
167th Acoustical Society of America Meeting

The Encoding of Non-categorical Aspects of Speech and its Maintenance in Memory

Georgia Zellou –
University of Pennsylvania & University of California, Davis
3401 Walnut St.
Philadelphia, PA 19104

Delphine Dahan –
University of Pennsylvania
3401 Walnut St.
Philadelphia, PA 19104

Popular version of paper 2aSC1
Presented Tuesday morning, May 6, 2014
167th ASA Meeting, Providence


For most of us, speech comprehension appears to proceed smoothly, with words revealing themselves as they are being heard and in the order they are presented. This impression is misleading, however. Speech is a noisy signal because the way a word sounds varies greatly from person to person and even from moment to moment for the same talker. Many differences can be ignored while others must be attended to in order to decide which of the many possibilities the intended word was. In this research, we show that people hold onto those differences well past the end of a word and while interpreting subsequent words, in effect maintaining uncertainty regarding the identity of a word until more information on the neighboring words is gathered. Sentences, rather than individual words, are the true products of our perception.

Our investigation examined people’s interpretation of speech in real time. People heard a sentence while viewing a set of pictured objects on a computer screen and clicked on the object mentioned in the sentence. Sentences consisted of pairs, for example “he bet his money” or “he bent his knee,” where the beginning was identical up to a critical word (here, either “bet” or “bent,” which differ in the presence or absence of the “n”). Using an eye-tracking device, we assessed how long it took people to direct their gaze to the referred object (either “money” or “knee”). On test trials, importantly, the way the critical word (“bet”) sounded was artificially manipulated, making it sound more or less like its counterpart (“bent”). The same sentence was presented, “he bet his money,” but one of three versions of “bet” could be heard: one unequivocally perceived as “bet,” one also perceived as “bet” but closer in sound to “bent,” and one truly ambiguous between “bet” and “bent.”

In all three versions of the test trials, people clicked on the picture of money. However, the speed and ease with which they decided they heard “money,” as reflected by their gaze, varied across the versions. Compared to the unequivocal version of “bet,” listeners were slower to look at the picture of money after hearing the ambiguous “bet.” Critically, they were also delayed (albeit to a smaller degree) after hearing the “bet” that was modified to be slightly closer to “bent,” compared to the unequivocal “bet.” This latter result is significant because both versions were always heard as “bet.”  The perception of “money” depended on the clarity of the preceding “bet.”

This research reveals the following: As people are recognizing words in speech, they remember small differences in sounds after a word is pronounced and these differences affect interpretation of subsequent words. Sensitivity to non-contrastive phonetic differences, those that do not result in perceiving different words, has been demonstrated before. Here, we go beyond this phenomenon by showing that these details are held in memory in order, we argue, to resolve uncertainty inherent in the speech signal by considering neighboring words.