Popular version of paper 2aSC17
Presented Tuesday morning, Dec. 4, 2001
142nd ASA Meeting, Fort Lauderdale, FL
Introduction
Recent advances in speech
synthesis have provided new techniques for manipulating the properties
of speech, making it possible, for example, to simulate highly realistic
changes in voice quality, or enhance the acoustic properties of speech
that pose difficulties for second language learners or hearing impaired
listeners. In this paper we report the results of experiments examining
the importance of time-varying changes in the acoustic properties of vowels.
Traditionally, vowels were often treated as static entities: acoustic measurements
were taken from a single cross-section at the vowel midpoint, and in synthesis
the spectral properties of vowels were held constant over time. However,
recent studies have shown that time-varying properties of vowels make an
important contribution to identification.
Vowel quality is determined primarily by the frequencies of the vocal tract resonances, called formants, while the pitch of the voice is determined mainly by the fundamental frequency (F0), associated with vibration of the vocal folds. Synthesis studies have shown that American English vowels are less accurately identified when natural time-varying changes in the formant frequencies are eliminated (by "flattening" these changes over time). On the other hand, holding F0 constant over time produces a monotone voice pitch, but has little effect on vowel identification accuracy. Thus formant frequency movement is important for the identification of American English vowels, but F0 movement has little effect.
One limitation in the earlier experiments was that the synthesized versions were less accurately identified than natural vowels. To overcome this limitation we used a high-quality vocoder, STRAIGHT, developed by Hideki Kawahara, to re-examine the effects of spectral change and source properties in vowel identification. The results confirmed (1) that time-varying changes in the formants are important for the identification of American English vowels, and (2) that changes in fundamental frequency have little effect on vowel identification.
Stimuli
Result 1: Listeners identified the synthesized
vowels as well as the natural vowels.
|
|
Result 2: Holding the fundamental frequency
(F0) constant
Result 3: Holding the formant frequencies constant
|
|