ASA PRESSROOM

Acoustical Society of America
132nd Meeting Lay Language Papers


Investigating the Common Cold to Improve Speech Technology

Renetta Garrison Tull - garrison@ece.nwu.edu
Communications Sciences and Disorders Department
Speech Processing Laboratory (ECE)
Northwestern University
2145 Sheridan Road
Evanston, Illinois 60208-3118
Janet C. Rutledge - rutledge@ece.nwu.edu
Electrical and Computer Engineering Department
Northwestern University
Charles R. Larson - clarson@casbah.acns.nwu.edu
Communications Sciences and Disorders Department
Northwestern University

Popular Version of Paper 4aSC20
Presented Thursday Morning, December 5, 1996
3rd Joint ASA/ASJ Meeting, Honolulu, Hawaii
Embargoed until December 5, 1996

This research looks at the differences between speech affected by a cold and normal/healthy speech so that the information can be used for automatic speaker recognition - one of the world's most exciting emerging technologies. Unlike "personal identity" technologies like scanning the retina of the eye, and taking finger prints, speaker recognition can utilize a piece of equipment that most people already have in their homes - a telephone. Speaker recognition is already being used to confirm that speakers are who they say they are. The recognition system has to be accurate because it is currently being used to detect imposters on cellular phones. The technology is also being used in a home incaraceration system for criminals in lieu of the formerly used ankle bracelet. Because speaker recognition is used in forensic applications, the technology must be able to handle a situation where a speaker can still be recognized if their speech sounds different from their ``normal" voice. This ``different" type of speech could represent a time period when a person's voice reflects that they are tired or excited. The research described here looked at what happened to the speech people who became sick with strains of the common cold.

People with ``cold-speech" can potentially be identified by a speaker recognition system if the system is is given a pattern for the cold. This project showed that the listeners can hear that there are differences between normal and cold conditions. Examining the speech signal showed that the lengths of the cold and normal sessions are not the same, and computer software showed that portions of the parameters that are used in speaker recognition systems have different values when a speaker has a cold. Mel-cepstral coefficients are popular parameters in speaker recognition research, and they were used in this study to show that there are differences between normal speech and ``cold-speech." Figure 1 shows that the difference between normal sessions is small (first bar: normal-normal). Likewise, the difference between cold sessions is also small (second bar: cold-cold.) But there is a noticeable difference between the normal session and the cold session (third bar: normal-cold.) This figure shows that the differences between normal speech and ``cold-speech" are still evident in the speaker recongnition parameter.


Figure 1.

Understanding the specific correlations between the particular mel-cepstral coefficients and the muscles of the larynx would give further insight into the significance of physiological parameters and their role in the sounds that the cold produces. A person with a sore throat may not have any difficulties being recognized by a speaker recognition system, but a person with a combination of a cough and nasal secretions might have trouble being recognized. This project is extending its study to examine the specific impact of the stuffy nose.

Studying the physiological and the linguistic elements that correspond to a cold can assist in the researcher's understanding of the behavior of the parameters in a recognition system. The phonetic information is useful because this study showed that some of the consonants that are produced with the tongue (i.e. the letters "t" and "l") were not pronounced clearly. This change in the structure of the consonants between normal and cold sessions could make a difference in the high frequency energy represented by particular cepstral coefficients. The physiological information, the phonetic information and the signal processing information are all interconnected. Aspects from all three of these fields should be utilized for perfecting applications in speech technology.

The problem of finding a ``voice-print" for a person remains but projects like this study on colds bring the world of speech technology closer to the solution. This study continues to ``turn over the stones" so that several areas of speech can be incorporated into the process of improving speaker identification and verification.

Recent Related Publications

Campbell, J. P., "Testing with the YOHO CD-ROM Verification Corpus", Proceedings of ICASSP, 1995

Cummings, K., and Clements, M., "Analysis of Glottal Waveforms Across Stress Styles", Proceedings, 1990 International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 369-372, April, 1990.

Flanagan, J. L., "Technologies of Multimedia Communication", Proceedings of the IEEE, vol. 82, no. 4, April, 1994.

Furui, S., "An Overview of Speaker Recognition Technology ", Proceedings of the ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification, pp. 1-9, April, 1994.

Tyrrell, D. A. J., Cohen, S., and Schlarb, J. E., "Signs and Symptoms in Common Colds", Epidemiology and Infection, vol. 111, no. 1, 1993.