We Can’t Help Imitating the Talkers We Hear – and See
Lawrence D. Rosenblum- rosenblu@citrus.ucr.edu
Rachel M. Miller - rmiller.ucr.grad@gmail.com
Kauyumari Sanchez - ksanc004@student.ucr.edu
Department of Psychology
University of California, Riverside
Riverside, California 92521
USA
Popular version of paper 2aSCb2
"Amodal specification of talker-specific motor behavior"
Presented Tuesday morning, July 1, 2008, Palais des Congrès, Room 242A Acoustics’08 Paris
When talking with someone, we often inadvertently imitate aspects of their speaking style including its speed, intonation, and even accent. While previous research has shown how this imitation can be based on the speech we hear, new research suggests that we also unconsciously imitate talkers based on the speech we see —or lipread. Furthermore, this imitation of lipread speech occurs for all of us, regardless of our level of hearing, and even when we’re in the non-social context of watching a talker on videotape.
In one experiment, a group of 16 naïve subjects with good hearing were first asked to read out-loud, a list of 80 simple words (‘tennis’, ‘castle’) presented as text on a computer screen. The subjects’ spoken words were audio recorded to be used for later comparisons.
Subjects were next asked to utter the same words but this time based on lipreading a videotaped model. In this task, subjects first saw two text words presented side-by-side on a video monitor (‘tennis’; ‘table’). Immediately after, subjects saw a short video clip of a model’s face silently articulating one of these words (‘tennis’). Subjects were asked to choose which of the two words the model was saying. They were told to indicate their choice by saying the word they lipread out loud, quickly and clearly. Subjects were not told to imitate or even repeat the model. The subjects’ spoken words were again audio recorded for later comparison.
In order to determine whether the subjects inadvertently imitated the model they lipread, a group of 32 naïve raters were brought into the lab. These raters were asked to listen to each of the subjects’ recorded words from both tasks (the words spoken when reading the word; and the words spoken when lipreading the model), and determine which set of words sounded more like the words produced by the model (which were also audio recorded). Comparisons between words were made one at a time, and raters did not know from which task the words were recorded. Results showed that the raters judged the subjects’ words produced when lipreading as sounding more like those of the model, than did the words produced when reading. This suggests that subjects did inadvertently imitate aspects of the model’s lipread speech despite never being asked to do so and without the presence of auditory speech.
A second experiment addressed whether lipread information can influence unconscious imitation of a very subtle and linguistically important aspect of speech. One distinction between the speech sounds ‘b’ and ‘p’ is the amount of time delay between when the lips part and when the vocal chords begin to vibrate. For ‘b’, the lips part and the vocal chords start vibrating at the same time. For ‘p’, after the lips part, there is a short delay (less than 1/10th of a second) before the vocal chords start vibrating. The exact amount of this delay for ‘p’ is dependent partly on a talker’s speaking style.
With regard to talker imitation, past research has shown that when asked to identify out-loud a heard word that begins with ‘p,’ we inadvertently imitate the talker’s vocal chord delay time. This shows that talker imitation can occur for a very subtle speech characteristic that has an important function (potentially distinguishing ‘b’ from ‘p’).
In order to determine whether lipread speech can also influence a subject’s vocal chord delay, seventeen subjects with good hearing were presented a videotape, with sound, of a model producing the syllables ‘pa,’ ‘ta,’ and ‘ga.’ The audio components of the ‘pa’ syllables were electronically manipulated to sound as if they had slightly different vocal chord delay times. These audio ‘pa’ syllables were dubbed onto video ‘pa’ syllables which had been recorded from a model speaking ‘pa’ at both a fast and slow rates (speaking rate is known to influence vocal chord vibration delay time). The subjects were asked to watch and listen to each audiovisual syllable, and then to say out loud the syllable they perceived, quickly and clearly. Again, subjects were not asked to imitate the speech in any way.
Acoustic analyses of the subjects’ recorded utterances revealed that when producing a ‘pa’ syllable, the amount of vocal chord delay subjects used was systematically effected by both the audio and video components of the speech they were presented. This finding suggests that lipread information from a speaker’s face can induce imitation of this very subtle, yet linguistically important speech characteristic.
These results show that inadvertent imitation of a talker can be based on either the heard or lipread speech of that talker. These findings are consistent with a substantial amount of research showing that despite limitations on our lipreading skills, we all automatically use some lipread information. These results further show that this lipread information is powerful enough to automatically change the speech we produce.
The findings also add to recent results showing that lipread information can tell us something about what is being said (the speech message), as well as who is saying it (talker properties). Finally, the results are consistent with findings outside the speech domain. Research on social interactions shows that we use visual information when we inadvertently imitate the body language and facial expressions of individuals with whom we are interacting. These new lipreading findings show that this inadvertent imitation also holds for the visible facial information that conveys speech.