143rd ASA Meeting, Pittsburgh, PA

[ Lay Language Paper Index | Press Room ]

Read My Lips: Computer Animated Tutors Teach Language

Dominic W. Massaro-
Dept. of Psychology, Univ. of California, Santa Cruz, CA 95064, (831) 229-1666

Alexis Bosseler, Dept. of Psychology, University of California, Santa Cruz, Santa Cruz, CA 95064,
Patrick S. Stone, Tucker-Maxon Oral School, 2860 SE Holgate Boulevard, Portland, OR 97202,
Pamela Connors, Tucker-Maxon Oral School, 2860 SE Holgate Boulevard, Portland, OR 97202,

Popular version of paper 5aSC19
Presented Friday morning, June 7, 2002
143rd ASA Meeting, Pittsburgh, PA

Imagine an effective tutor who is personable, engaging, and yet always available 24 hours a day, seven days a week. This tutor doesn't get tired, bored, or impatient. Better yet, the cost for this one-on-one time with a child is only a few cents an hour. This tutor exists because of facial animation, speech synthesis, and the application of several decades of speech science, linguistics, and psychology. Computer-assisted speech and language tutors are now helping children who are hard of hearing, children with autism, and children with language delays.

This language-training program utilizes a computer-animated talking head, Baldi, as the conversational agent, who guides students through a variety of exercises designed to teach vocabulary and grammar, to improve speech articulation, and to develop linguistic and phonological awareness. Baldi is an accurate three-dimensional animated talking head appropriately aligned with either synthesized or natural speech. Baldi has a tongue and palate, which can be displayed by making his skin transparent. The quality and intelligibility of Baldi's visible speech has been repeatedly modified and evaluated to accurately simulate naturally talking humans. Baldi's visible speech can be appropriately aligned with either synthesized or natural auditory speech. This technology has the potential to help individuals with language delays and deficits, and we report the results of experiments utilizing Baldi to carry out language tutoring.

Here is a demonstration of Baldi

There are compelling reasons to believe that Baldi and other animated agents can improve learning and language training. Human faces enrich interpersonal communication because they are informative, emotional and personal.  We often communicate better in face-to-face situations because we are able to combine many sources of information to perceive and understand, even when some of the information is ambiguous or fuzzy. When producing speech, faces are informative linguistically and the auditory and visual features of speech are often complementary. For example, the difference between bad and dad is easy to see but relatively difficult to hear. On the other hand, the difference between bat and pat is relatively easy to hear but difficult if not impossible to see. Effective language training specialists attend to these auditory and visual features of speech to judge the quality of their students' productions, and provide feedback in both dimensionsas we can do with animated agents.  In fact, animated faces can provide feedback that humans cannot by turning semi-transparent to show the movements of the tongue within the mouth from different angles, or by presenting visual patterns that represent acoustic phonetic features of sounds.

Animated faces can also communicate emotional content, a powerful source of information. As an example, artists and producers use emotional milestones expressed visually rather than spoken dialogues to design storyboards for animated productions. Animated agents can speak to the heartthe emotional content of a messageand to the brainthe intellectual content, thereby increasing the amount and quality of information conveyed. Animated agents also bring a personal dimension to human computer interaction. People personalize computerswe attribute personal characteristics to programs based on our interactions with them. When an animated agent is involved, we have observed that this effect is intensified greatly.

The language-tutorial application, the Vocabulary Wizard/Tutor, allows easy creation and presentation of a language lesson involving the association of pictures and spoken words. The lesson plan includes both the identification of pictures and the production of spoken words. The lesson designer imports a visual image and determines which parts of the image will be associated with spoken words or phrases. Figure 1 shows a view of the screen in a prototypical application in which the students learn to identify prepositions inside, next to, in front of, etc. The outlined region in orange designates the selected region. The faces in the left-hand corner of the figure are the "stickers," which show a happy or a sad face as feedback for correct and incorrect responses. All of the exercises require the child to respond to spoken directives such as "click on the little chair," or "find the red fox." The items become highlighted whenever the child moves the mouse over that region. The student responds by clicking on one of the designated areas, or touching the monitor when a touch screen is being used.

There are 5 application modules: pre-test, presentation, practice, production, and post-test. The Wizard/Tutor is equipped with easily changeable default settings that determine what Baldi says and how he says it, the feedback given for responses, the number of attempts permitted for the student per question, and the number of times each item is presented. The program also stores the student's performance in a log file.

This pedagogy is currently integrated into the curriculum of schools for hard of hearing children and schools for children with autism. Two specific language-training programs have been evaluated to determine if they improve word learning and speech articulation. The results for the hard of hearing children are shown in Figure 2. They indicate that the program is effective in teaching receptive and productive language. The students were able to identify significantly more vocabulary items during the post-test relative to initial assessment, indicating that the Vocabulary Wizard/Tutor was effective at training vocabulary.  Furthermore, the students were able to recall about 55% of the new vocabulary items that they learned 30 days following training. An independent evaluation was carried out by ABC PrimeTime, which featured this work on their program. The program can be seen at

Similar results were found for the children with autism. Across all lessons, the students identified significantly more words during the post-test. These children were able to recall 91% of the new vocabulary items 30 days following training.


The autistic students appeared to enjoy working with Baldi. The children made statements like "Hi Baldi" and " I love you Baldi."  The stickers generated for correct (happy face) and incorrect (sad face) responses proved to be an effective way to provide feedback for the children. Some students displayed frustration when they received more than one sad face and responded positively to the happy faces too, saying "Look, I got them all right," or laughing when a happy face appeared.

Advantages of utilizing a computer-animated agent as a language tutor are the popularity of computers and embodied conversational agents with kids, the perpetual availability of the program, and individualized instruction. Students enjoy working with Baldi because he offers extreme patience, he doesn't become angry, tired, or bored, and he is in effect a perpetual teaching machine. The results indicate that the psychology and technology of Baldi holds great promise in language learning and speech therapy.

[Work supported by National Science Foundation (Grant No. CDA-9726363, Grant No. BCS-9905176, Public Health Service Grant No. PHS R01 DC00236), and the Cure Autism Now (CAN) Foundation.]


[ Lay Language Paper Index | Press Room ]