Renetta Garrison Tull - garrison@eecs.nwu.edu
Communications Sciences and Disorders Department
Janet C. Rutledge - rutledge@eecs.nwu.edu
Electrical Engineering and Computer Science Department
Northwestern University
Popular Version of Paper 4aSC20
Thursday Morning, 16 May 1996
ASA Spring Meeting, Indianapolis, IN
Embargoed until 16 May 1996
The goal of this research is to understand the characteristics of speech produced by a person who has a cold so that automatic speaker recognition systems will be able to recognize that person on both healthy and sick days. Automatic speaker recognition is an exciting technology that uses computers to automatically recognize a person based on his/her voice. The signal from a speaker's voice has information that can lead to the identity of the speaker, a speaker's native language, identification of speech disorders, physical state, and emotional state.
Speaker recognition technology is currently being designed for gaining access to privileged databases such as personnel records, proprietary data, banking transactions over a telephone network, telephone shopping, voice mail, security control for confidential information areas, and remote access to computers. Applications involving fund transfers, entry to restricted premises, and telephone network business transactions would require authenticity of personal identity. To accommodate these applications, speaker recognition technology is divided into the two areas: speaker identification and speaker verification. Speaker identification systems identify a speaker from a group of speakers in a database, so there are several choices (alternatives) that can be made. Speaker verification systems can only make two choices: ``accept" the speaker (correct match), or ``reject" a speaker (speaker is an impostor.) Speaker recognition concentrates on choosing the proper speaker, therefore the researchers seek to find unique characteristics that will make the differences between speakers as large as possible. Speaker recognition differs from ``speech" recognition which seeks the proper message(words.) SPEECH recognition systems are designed to make voices sound as similar as possible so that the words can be determined regardless of the speaker. SPEAKER recognition systems enhance the differences in voices so that individual speakers can be identified or verified.
Perfecting the performance of automatic speaker recognition systems is a goal in speech technology. The research described in this paper introduces distorted speech produced by a cold (``cold-speech"). This ``cold-speech" information will provide more data for developers of speaker recognition systems so that systems will be able to recognize speakers under both normal and healthy conditions.
``Cold-speech" and normal speech have similar properties, but there are variations. A listener can hear the differences and computer analysis shows that there are differences in the speech waveform. A person's voice varies from day to day even in perfect health, so variation during a cold does not make the problem any easier. Speaker recognition systems require quite a bit of speech from one person. The necessary information from one person includes recordings on different days over several months. This procedure is called "training" the system. The problem with colds is that a cold is a temporary pathology and there would not be a large amount of ``cold-data" available from each person to train a recognition system and establish a pattern over a long period of time.
To begin to study ``cold-speech" in this work, several parameters of one male subject's speech during winter weather were analyzed. This study combines features from three areas of speech research: phonetics, nasal and laryngeal physiology, and digital speech signal processing. The combination of these areas from the broader fields of linguistics, anatomy, and electrical engineering gives a well-rounded background for studying cold-speech. This study observes features for speech analysis such as resonances of the vocal tract (vibrations of sound along the path from the larynx (voice box) to the lips), fundamental frequency (measurements of voice pitch), phonetic differences (examining the different ways that sound is produced in the mouth), and mel-cepstral coefficients (parameters that are designed according to the way that a human ear perceives different sounds.)
This study contrasts different sessions of the sentence: ``She had your dark suit in greasy wash water all year.'' A team of expert listeners was able to correctly identify the ``cold-speech" sentences from the normal, healthy sentences. During the ``cold-speech" recording, the subject had a runny nose, stuffy nose, headache, cough, hoarseness, and was physically tired. ``Cold-speech" shows some noisy portions in the acoustic signal that are not present in the normal, healthy signals. These noisy portions are caused by hoarseness and coughing. In "cold-speech", there are some differences in the ways that sounds are made, and differences in the pitch. This work on ``cold-speech" and its relationship to physiological parameters to improve speaker recognition was strongly influenced by research on stressed speech. The stressed speech research used laryngeal waveforms to analyze different types of emotionally stressed speech. Similarly, this work on ``cold-affected'' speech provides a preliminary analysis of the changes that speech parameters exhibit due to physiological differences. The effects of the physiology on the acoustic signal and the relation of these effects to the mel-cepstral coefficients are being investigated. This ``cold-speech" research begins the long-term goal of incorporating speech affected by disorders into the framework of automatic speaker recognition.
RECENT RELATED PUBLICATIONS
Che, C. and Lin, Q., "Speaker Recognition Using HMM with Experiments on the YOHO Database", Proceedings of Eurospeech, 1995.
Cummings, K., and Clements, M., "Analysis of Glottal Waveforms Across Stress Styles", Proceedings, 1990 International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 369-372, April, 1990.
Flanagan, J. L., "Technologies of Multimedia Communication", Proceedings of the IEEE, vol. 82, no. 4, April, 1994.
Furui, S., "An Overview of Speaker Recognition Technology ", Proceedings of the ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification, pp. 1-9, April, 1994.
Rosenberg, A. E., and Soong, F. K., "Recent Research in Automatic Speaker Recognition", in Advances in Speech Signal Processing, chapter 22, Furui, S., and Sondhi, M. M., Marcel Dekker, 1992.
Tyrrell, D. A. J., Cohen, S., and Schlarb, J. E., "Signs and Symptoms in Common Colds", Epidemiology and Infection, vol. 111, no. 1, 1993.