ASA/CAA '05 Meeting, Vancouver, BC

[ Lay Language Paper Index | Press Room ]

A Robot That Mimics Human Speech

Kotaro Fukui- kotaro@toki.waeda.jp
Kazufumi Nishikawa, Toshiharu Kuwae, Masaaki Honda, Atsuo Takanishi
Waseda University
59-308, Ookubo, Shinjuku-ku, Tokyo, 169-8555, Japan

Hideaki Takanobu
Kogakuin University

Takemi Mochida
Communication Science Laboratories, Nippon Telegraph and Telephone Corporation

Popular version of paper 4aSC1
Presented Thursday morning, May19, 2005
149th ASA Meeting, Vancouver, BC

1. Introduction

The purpose of this research is to investigate a human vocal mechanism from engineering viewpoints by reproducing the entire process of articulated speech by using a talking robot. The mechanical talking robot has several engineering applications like an audio-visual talking head, medical supporting devices for vocally challenged people and lifelike learning devices for foreign languages.

2.WT-4(Waseda Talker No. 4)

We developed an anthropomorphic talking robot WT-4 (Waseda Talker No.4) to produce human speech. WT-4 consists of lungs, vocal cords and articulators (tongue, lips, teeth, nasal cavity and soft palate). Together these parts have 19 degrees of freedom (DOF), with each DOF indicating an independent direction of movement. The lips and the tongue are made of elastic material and controlled by a looped wire mechanism (see below). The elastic material and its framework can deform by large amounts, just as the tongue and lips can, and the elastic material prevents air and sound leaks. The robot uses lungs to power its sounds. The vocal cords contract to include voiced consonants such as the sound /b/ in "bus." The vocal cords open to produce voiceless consonants such as the sound /k/ in "walk." The robot's lips, teeth, tongue, nasal cavity and soft palate are all constructed to move just like real human parts. WT-4 could produce other phonetic sounds such as stops, fricatives and nasal consonants in all Japanese syllables as well as five Japanese vowels with intelligible sound quality.


WT-4 (Section)	Vibration of WT-4's Vocal Cords (High-Speed Camera/1000[fps])

3. Mimicking Mechanism

WT-4 not only talks; it hears and imitates sounds autonomously. The robot tracks acoustic information from a human speaker, then generates sound through its vocal mechanisms. Through its imitation, it can repeat sentences as a person talks (this is illustrated in the last video demonstration below). The talking robot can produce vowel and consonant sounds by mimicking the vocal cords' vibration and the fricative and plosive sound source generation by the air flow as well as though dynamically controlled acoustic resonance. Articulatory control of the talking robot is designed to track the acoustic goals (pitch, sound power, two formant frequencies, and voice-unvoiced timing) of the speech. It was shown that this mimicking speech control was effective in producing fluent continuous speech by the talking robot.

4. Demonstration of Talking Robot WT-4

Click the following pictures and see the Talking Robot movies.


aiueo MPEG 1850KB	sasisuseso MPEG 2490KB	Human Vocal Mimicry "hassei" MPEG 2167KB

[ Lay Language Paper Index | Press Room ]