ASA/CAA '05 Meeting, Vancouver, BC


[ Lay Language Paper Index | Press Room ]


Size Information in Natural Communication Sounds

Ralph van Dinther - ralph.van-dinther@mrc-cbu.cam.ac.uk
Roy D. Patterson
CNBH, Dept. of Physiology, University of Cambridge
Cambridge, CB2 3EG, United Kingdom

Popular version of paper 1aPP10
Presented Monday morning, May 16, 2005
ASA/CAA '05 Meeting, Vancouver, BC

There is size information in natural sounds. When a child and an adult say the same word, it is possible to identify which one is the smaller person. The child has a shorter vocal tract and lighter vocal cords, and as a result, the waveform carrying the message is quite different for the child. The fact that we hear the same message shows that the auditory system 'normalizes' speech sounds for vocal-tract length and glottal-pulse rate.

Recent studies have shown that listeners can judge the relative size of two individuals with considerable precision, and they can recognize vowels scaled to simulate people taller and shorter that ever experienced. So one thing that is realistic about fantasy films like Lord of the Rings (Figure 1), is that we could understand the speech of larger and smaller races, even if they were double the size of Gandalf or half the size of Gimli.


 

There is a new high-quality vocoder called STRAIGHT which can encode speech sounds and resynthesize them with different glottal-pulse rates (pitches) and vocal-tract length. The 'red arrows' demonstration below allows you to listen to the voice of one person as we systematically change the glottal-pulse rate and vocal-tract length, singly or in combination. Just click on one of the arrows.

 

 

It is also possibility to judge the size of musical instrument sounds. When a violin and cello play the same note, we know which is the larger instrument. At the same time, we recognize that both instruments belong to the string family. This implies that the auditory system normalizes all sounds, not just speech sounds.

The normalization processes are particularly important for 'pulse/resonance' communication sounds, which consist of a stream of acoustic pulses each of which carries a complex resonance. These are the sounds that form the basis of natural communication, that is, speech, music and animal calls. Figure 3 shows the waveforms of a trumpet (top panel) and a trombone (bottom panel) with the pulses and resonances indicated by arrows. The pulse rate is the same and so they have the same pitch, since pitch is the psychological correlate of the repetition rate of a musical note. The resonance is the wiggle that follows the pulse. It has roughly the same shape for the trumpet and trombone because they are both brass instruments. However, the scale of the trombone is more dilated than that of the trumpet, and it is this that makes the trombone sound larger than the trumpet when they play the same note. In speech sounds, the pulse rate is simply the glottal-pulse rate; the vocal-tract shape determines the shape of the resonance, and the vocal-tract length determines the scale of the resonance. In musical instruments, the factors that determine the pulse rate, the resonance shape, and the resonance scale are much more complex, but the perception of size in the sound can still be described in terms of pulse rate and resonance scale.

 

This paper reports two studies designed to extend the research to musical instrument sounds. The first study showed that listeners can discriminate the relative size of instruments reliably, although not quite as well as for voices. The second showed that listeners can recognize instrument sounds modified in pulse rate and resonance scale well beyond the range of normal experience. The results support the hypothesis that the auditory system applies some kind of active normalization to all input sounds.

Other papers in session 1aPP10 describe discrimination of speaker size and recognition of speech sounds scaled well beyond the normal range. The research is important because speech recognition machines, cochlear implants and music coders do not include normalizing preprocessors and this seriously limits their robustness.


[ Lay Language Paper Index | Press Room ]