“How would you sketch a sound with your hands?”

Hugo Scurto – Hugo.Scurto@ircam.fr
Guillaume Lemaitre – Guillaume.Lemaitre@ircam.fr
Jules Françoise – Jules.Francoise@ircam.fr
Patrick Susini – Patrick.Susini@ircam.fr
Frédéric Bevilacqua – Frederic.Bevilacqua@ircam.fr
1 place Igor Stravinsky
75004 Paris, France


Popular version of paper 2aSCb3, “Combining gestures and vocalizations to imitate sounds”
Presented Tuesday morning, November 3, 2015, 10:30 AM in Grand Ballroom 8
170th ASA Meeting, Jacksonville



Scurto fig 1

Figure 1. A person hears the sound of door squeaking and imitates it with vocalizations and gestures. Can the other person understand what he means?

Have you ever listened to an old Car Talk show? Here is what it sounded like on NPR back in 2010:

“So, when you start it up, what kind of noises does it make?
– It just rattles around for about a minute. […]
– Just like budublu-budublu-budublu?
– Yeah! It’s definitely bouncing off something, and then it stops”


As the example illustrates, it is often very complicated to describe a sound with words. But it is really easy to make it with our built-in sound-making system: the voice! In fact, we have observed earlier that this is exactly what people do: when we ask a person to communicate a sound to another person, she will very quickly try to recreate this noise with her voice – and also use a lot of gestures.

And this works! Communicating sounds with voice and gesture is much more effective than describing them with words and sentences. Imitations of sounds are fun, expressive, spontaneous, widespread in human communication, and very effective. These non-linguistic vocal utterances have been little studied, but nevertheless have the potential to provide researchers with new insights into several important questions in domains such as articulatory phonetics and auditory cognition.

The study we are presenting at this ASA meeting is part of a larger European project on how people imitate sounds with voice and gestures: SkAT-VG (“Sketching Audio Technologies with Voice and Gestures”, http://www.skatvg.eu): How do people produce vocal imitations (phonetics)? What are imitations made of (acoustics and gesture analysis)? How do other people interpret them (psychology)? The ultimate goal is to create “sketching” tools for sound designers (the persons that create the sounds of everyday products). If you are an architect and want to sketch a house, you can simply draw it on a sketchpad. But what do you do if you are a sound designer and want to rapidly sketch the sound of a new motorbike? Well, all that is available today are cumbersome pieces of software. Instead, the Skat-VG project aims to offer sound designers new tools that are as intuitive as a sketching pad: simply use their voice and gestures to control complex sound design tools. Therefore, the SkAT-VG project also conducts research in machine learning, sound synthesis, and studies how sound designers work.

Here at the ASA meeting, we are presenting a partial study in which we asked the question: “What do people use gestures for when they imitate a sound?” In fact, people use a lot of gestures, but we do not know what information these gestures convey: Are they redundant with the voice? Do they convey specific pieces of information that the voice cannot represent?

We first collected a huge database of vocal and gestural imitations. Then, we asked 50 participants to come to our lab and make vocal and gestural imitations for several hours. We recorded their voice, filmed them with a high-speed camera, and used a depth camera and accelerometers to measure their gestures. This resulted in a database of about 8000 imitations! This database is an unprecedented amount of material that now allows


We first analyzed the database qualitatively, by watching and annotating the videos. From this analysis, several hypotheses about the combination of gestures and vocalizations were drawn. Then, to test these hypotheses, we asked 20 participants to imitate 25 specially synthesized sounds with their voice and gestures.

The results showed a quantitative advantage of voice over gesture for communicating rhythmic information. Voice can reproduce accurately higher tempos than gestures, and is more precise than gestures when reproducing complex rhythmic patterns. We also found that people often use gestures in a metaphorical way, whereas voice reproduces some acoustic features of the sound. For instance, people shake their hands very rapidly whenever a sound is stable and noisy. This type of gesture does not really follow a feature of the sound: it simply means that the sound is noisy.

Overall, our study reveals the metaphorical function of gestures during sound imitation. Rather than following an acoustic characteristic, gestures expressively emphasize the vocalization and signal the most salient features. These results will inform the specifications of the SkAT-VG tools and make the tools more intuitive.



Share This