Kathleen E. Cummings - kate@eedsp.gatech.edu
Digital Signal Processing Laboratory
School of Electrical and Computer Engineering
Georgia Institute of Technology
Collaborators: Steven B. Chin and David B. Pisoni, Speech
Research Laboratory, Department of Psychology, Indiana University
Popular Version of Paper 4aSC21
Presented Thursday morning, 16 May 1996
Acoustical Society of America, Indianapolis, Indiana
Embargoed until 16 May 1996
The goal of the research reported here was to determine whether or not speech produced while a person is intoxicated is significantly and identifiably different from speech produced while a person is sober. The results of analyzing sober versus intoxicated speech for four speakers show that the parameters that measure the steadiness with which a person produces speech are the parameters that are most affected by alcohol use. Such measures as pitch frequency and energy produced vary more frequently and by greater magnitudes when the speaker is intoxicated than when the speaker is sober.
Because of the physical and mental impairments associated with alcohol consumption, there is considerable interest in developing a simple, inexpensive, non-invasive method of reliably assessing whether or not a person is intoxicated. Current methods of determining whether a person has been consuming alcohol involve chemical analyses of blood- or breath-alcohol concentrations. The goal of this research is to determine whether it is possible to determine whether a person is intoxicated or not using a sample of his speech. Not only would such a test be simple and non-invasive, it would make it possible to determine intoxication after an incident has occurred, provided there is a record of a person's speech at the time of the incident.
Speech is produced when air from the lungs passes through the vocal cords and excites the vocal tract, which acts as an acoustic resonator. Sounds are characterized by the manner of excitation (either unvoiced, voiced, or both) and by the particular shape of the vocal tract. In unvoiced excitation, the air flows uninterrupted through the vocal cords until it reaches a constriction in the vocal tract, where it acts as an acoustic noise source. In voiced speech, the slit between the vocal cords, the glottis, opens and closes periodically, producing puffs of air that excite the vocal tract. These periodic pulses of air are commonly called the glottal excitation waveform.
In previously-reported research using speech samples from the same database as was used in this research, both perceptual and acoustic analyses were performed. The perceptual experiments demonstrated that listeners can reliably discriminate between sober and intoxicated speech samples. The acoustic analyses showed that the mean duration of sentences was significantly and consistently longer for speech produced in the intoxicated condition. Also, pitch variability was greater in the intoxicated condition. Finally, significant differences were found between intoxicated and sober speech for sounds that require very precise timing and positioning of the articulators, such as voiced stops (e.g., 'b'), affricates (e.g., 'ch'), and stop clusters (e.g., 'st').
In the research reported here, the focus was on parameters of the speech waveform that are related to the glottal excitation. Specifically, several parameters that measure the frequency, timing, and shape of the glottal excitation were measured and compared for sober and intoxicated speech for each speaker. The most significant differences were found in parameters that measure perturbations in adjacent periods of the excitation. Specifically, the analyses have shown that, for each of the four speakers studied, the speech produced in the intoxicated condition is more variable than the speech produced in the sober condition. This trend toward less stable speech was identified in several parameters included stationarity of the vocal tract, shape of the glottal waveform, frequency of adjacent excitation periods, and the amount of energy produced.