ASA PRESSROOM

Acoustical Society of America
139th Meeting Lay Language Papers


When Lipreading Words is as Accurate as Listening

Sven Mattys - smattys@hei.org
Lynne E. Bernstein
Edward T. Auer, Jr.
Department of Communication Neuroscience
House Ear Institute
2100 West Third Street
Los Angeles, CA 90057

Popular version of paper 4aSC10
Presented Friday morning, June 2, 2000
139th ASA Meeting, Atlanta, Georgia
For additional information, please contact Dilys Jones at (213) 353-7012


A version of this work has been submitted to Academic Press for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Abstract

The capacity to understand speech from the moving lips of a talker (lipreading) is thought to be extremely difficult. Popular imagery has typically represented lipreading as a result of very complex computations (e.g., the super-computer Hal in "2001: A Space Odyssey"). In our research, we found that, despite the impoverishment of visual speech compared to auditory speech, it is possible to predict which words will be recognized most accurately, based on computational estimates of words' visual distinctiveness from the other words of the dictionary. Specifically, we predicted that words that are visually distinct from all other words should be identified almost as well as their auditory counterparts. This is exactly what we found. Moreover, deaf and hearing participants performed in very similar ways. The results demonstrate that: (1) lipreading is more predictable than it is thought to be, (2) lipreading can be very accurate for words that are visually distinct from other words, and (3) lipreading shows strong similarities with the mechanisms involved in the recognition of auditory words.



Visual Similarity of Speech Sounds and Assistance from the Lexicon

Below is an illustration of the problem that lipreaders face when they try to recognize words from moving lips. Visually, the words "bat" and "pat" are highly similar. This is so because the articulatory gestures that distinguish the speech categories (or phonemes) "B" and "P" take place in a location of the vocal tract hidden from view: the vocal cords. To produce the phonemes "B" and "P", the lips come together and come apart suddenly. The vocal cords have to vibrate at the same time for "B", but are delayed for "P".

Click on image to see videoclip of "BAT"
Click on image to see videoclip of "PAT"

However, the problem of similar appearing phonemes can be solved for many words in our vocabulary (or mental lexicon). For example, the confusion between "B" and "P" illustrated above is unlikely to happen when the word "brief" is spoken, because a word like "prief" does not exist in the lexicon. Thus, the content of the lexicon can dramatically reduce the problem of visual impoverishment that occurs for visual speech.

Click on image to see videoclip of "BRIEF"



Experiment

In this study, we showed that visual word recognition accuracy can be predicted based on (1) phoneme similarity estimates, and (2) an approximation of the content of the American English lexicon.

Participants.
Eight individuals with profound hearing impairements (deaf) and eight individuals with normal hearing were tested. All were undergraduate students of the California State University at Northridge. Mean age was 22.5 years (range: 19 to 26) for the deaf group -- onset of hearing impairements was pre-lingual -- and 23.5 years (range: 21 to 28) for the hearing group. All of the participants were good or better lipreaders.

Experimental Procedure.
Participants were seated in front of a computer monitor in a quiet room. 282 words, presented one at a time, were spoken by a female talker, with her face filling most of the monitor frame. The sound was turned off. The participants' task was to identify each word the talker said by typing it in on a computer keyboard. After entering a response, the participants pressed a keyboard key to see the next word.

Stimuli.
Three types of words were presented, as a function of their similarity with other words in the lexicon (i.e., the number of visual competitors).

- words with no visual competitors UNIQUE
- words that are visually similar to 1 to 5 other words MEDIUM
- words that are visually similar to 9 or more words HIGH

The number of competitors was estimated based on computer modeling of the lexicon, which used data on visual confusions among consonant-vowel syllables ("ba", "da", "ta", etc.).

In addition, half the test words were chosen among frequently-occurring words (used daily in conversational English) and the other half among infrequently-occurring words. Words were also either one syllable long or two syllables long.

Results. Word recognition accuracy, in percent words correct, is shown in the graphs below, separately for the deaf and the hearing participants.

These graphs underscore several findings:

  • Word recognition accuracy is greatly influenced by the number of visual competitors of a test word, with accuracy being much higher when the test word is visually distinct from all other words in the lexicon (unique) than when it has some (medium) or many (high) visual competitors.
  • Word recognition accuracy is influenced by the frequency of occurrence of the test words: Frequently-occurring words are recognized more readily than infrequently-occurring words.
  • Everything else being equal (number of visual competitors and frequency), word length does not seem to influence recognition accuracy.
  • Word recognition accuracy is slightly higher among deaf than hearing participants, although the difference did not reach statistical significance.


Conclusion

Lipreading is far more than a "guessing game": Visual word recognition depends on more than contextual information and strategic processes. This experiment shows that word lipreading is amenable to scientific investigation. Performance can be predicted based on computer analyses of the lexicon and of the typical patterns of phoneme similarity. For example, as predicted, several test words labeled as "unique" were identified correctly by all of the participants (e.g., brief, floor, special, question). In contrast, a majority of test words that were predicted to be similar to many other words were recognized only occasionally, or not at all (e.g., news, best, hidden, basic).

Likewise, the frequency of occurrence of the test words contributed to recognition accuracy. Independently of their number of visual competitors, frequent words were recognized more accurately than infrequent words.

Effects of lexical similarity and of frequency of occurrence are mechanisms that have been observed repeatedly in the auditory word recognition domain. The present results indicate that these two principles apply to "word recognition" in general, regardless of the perceptual modality, regardless of the specific patterns of phoneme similarity, and regardless of the hearing status of the perceiver (deaf vs. hearing).

This work was supported by NIH Grant # DC02107.


[ Lay Language Paper Index | Press Room ]