Julian Palacino – firstname.lastname@example.org
Rozenn Nicol – email@example.com
2 Av Pierre Marzin
Popular version of paper 4aSP2
Presented Thursday morning, June 6, 2013
ICA 2013 Montreal
Nowadays, everyone has heard about 3D video but only few of us have already heard 3D audio. The aim of 3D audio techniques is to record and reproduce sound with the naturalness as we perceive it in real life.
Human beings use binaural perception (using both ears) to recognize the position of an acoustic source. A sound arrives first and louder to the ear closest to the sound source and our head and pine modifies sounds in function of its direction. Our brain learns to interpret those cues and let us determine the position of a sound.
For several decades spatial or 3D audio has been only used by movie makers, music composers and researchers in laboratories; but because of its complexity, the general public hasn’t been concerned about those techniques. In addition, dedicated devices such as microphones and loudspeakers are expensive and cannot be used without some expertise of audio capturing and reproduction.
Currently, audio techniques such stereo, 5.1 or 7.1 gives a limited spatial impression. Naturalness is sacrificed to get a good quality and resulting sound is different as it was listened during the performance.
(see figure 1- Traditional recording process and downmix: Instruments, sound scene and ambience are generally picked up with a big number of microphones. All those signals are then mixed by a sound engineer who sets up how loud each microphone is played on each loudspeaker).
Other approaches have also been used in order to pick up and reproduce the acoustic waves as close as possible to the sound produced during the performance. Binaural uses a dummy head equipped of two microphones in the ears, audio signals picked up are modeled by the shape of the head and the ears pine  to reproduce a sound as close as possible to the one heard by a human (see figure 2, Neuman KU4 – Comercial dummy head for binaural recordings). For the restitution, the sounds must be reproduced close to the listener ears which suppose the use of headphones. Ambisonics and higher order ambisonics (HOA) allow to record a 3D acoustic image using a big number of microphones (See figure 3, Eigenmike – HOA microphone array composed by 32 microphones). For the reproduction, this image is projected over a big number of loudspeakers .
Nowadays, the main barrier preventing a consumer solution from capturing spatial audio is the big number of microphones and loudspeakers needed to get an accurate 3D sound image or the size of the recording devices. In order to break down this barrier, we propose a new 3D audio recording set-up which is composed of a three microphone array capable of getting the full 3D audio information.
Microphone array is composed of three cardioid microphones (See figure 4 Layout of the microphone device), one pointing left, the other pointing right and the third one pointing upwards. In order to introduce a time delay between them, the horizontal microphones are misaligned.
Contrary to a sound picked up with an omnidirectional microphone which the level doesn’t depend on the direction of the source, a sound picked up with a cardioid one has a particular level corresponding to the direction of the sound source .
In our case, it is possible to recover a sound signal close to the one picked up by an omnidirectional microphone adding signals coming from the cardioids microphones pointing left and right. Comparing the level of this reference signal with the signal level picked up by each microphone, one can get the position of the sound source. As the array is symmetrical, results are front-rear ambiguous. As the sound arrives first to the microphone closest to the source this information is also used to solve the ambiguity and get the right position.
The two microphones placed over the horizontal plane give the localization information in terms of azimuth and the one pointing upwards the elevation position. Those two parameters give the right location of the sound source. In several cases, elevation information is not needed and only a two microphones array gives the necessary information.
As sound sources are well localized this technique can be the first step for a sound reproduction over any kind of spatial audio system such as stereo, 5.1, 22.1, binaural, ambisonics or wavefield synthesis. The number of microphones and its size make this system compatible with mobile devices such as smartphones and tablets and we can expect to find this system in the market as a third-party accessory for audio applications or directly embedded in this kind of devices.