3aSPb5 – Improving Headphone Spatialization: Fixing a problem you’ve learned to accept

Muhammad Haris Usmani – usmani@cmu.edu
Ramón Cepeda Jr. – rcepeda@andrew.cmu.edu
Thomas M. Sullivan – tms@ece.cmu.edu
Bhiksha Raj – bhiksha@cs.cmu.edu
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213

Popular version of paper 3aSPb5, “Improving headphone spatialization for stereo music”
Presented Wednesday morning, May 20, 2015, 10:15 AM, Brigade room
169th ASA Meeting, Pittsburgh

The days of grabbing a drink, brushing dust from your favorite record and playing it in the listening room of the house are long gone. Today, with the portability technology has enabled, almost everybody listens to music on their headphones. However, most commercially produced stereo music is mixed and mastered for playback on loudspeakers– this presents a problem for the growing number of headphone listeners. When a legacy stereo mix is played on headphones, all instruments or voices in that piece get placed in between the listener’s ears, inside of their head. This not only is unnatural and fatiguing for the listener, but is detrimental toward the original placement of the instruments in that musical piece. It disturbs the spatialization of the music and makes the sound image appear as three isolated lobes inside of the listener’s head [1], see Figure 1.

usmani_1

Hard-panned instruments separate into the left and right lobes, while instruments placed at center stage are heard in the center of the head. However, as hearing is a dynamic process that adapts and settles with the perceived sound, we have accepted headphones to sound this way [2].

In order to improve the spatialization of headphones, the listener’s ears must be deceived into thinking that they are listening to the music inside of a listening room. When playing music in a room, the sound travels through the air, reverberates inside the room, and interacts with the listener’s head and torso before reaching the ears [3]. These interactions add the necessary psychoacoustic cues for perception of an externalized stereo soundstage presented in front of the listener. If this listening room is a typical music studio, the soundstage perceived is close to what the artist intended. Our work tries to place the headphone listener into the sound engineer’s seat inside a music studio to improve the spatialization of music. For the sake of compatibility across different headphones, we try to make minimal changes to the mastering equalization curve of the music.

Since there is a compromise between sound quality and the spatialization that can be presented, we developed three different systems that present different levels of such compromise. We label these as Type-I, Type-II, and Type-0. Type-I focuses on improving spatialization but at the cost of losing some sound quality, Type-II improves spatialization while taking into account that the sound quality is not degraded too much, and Type-0 focuses on refining conventional listening by making the sound image more homogeneous. Since the sound quality is key in music, we will skip over Type-I and focus on the other two systems.

Type-II, consists of a head related transfer function (HRTF) model [4], room reverberation (synthesized reverb [5]), and a spectral correction block. HRTFs embody all the complex spatialization cues that exist due to the relative positions of the listener and the source [6]. In our case, a general HRTF model is used which is configured to place the listener at the “sweet spot” in the studio (right and left speakers placed at an angle of 30° from the listener’s head). The spectral correction attempts to keep the original mastering equalization curve as intact as possible.

Type-0, is made up of a side-content crossfeed block and a spectral correction block. Some headphone amps allow crossfeed between the left and right channels to model the fact that when listening to music through loudspeakers, each ear can hear the music from each speaker with a delay attached to the sound originating from the speaker that is furthest away. A shortcoming of conventional crossfeed is that the delay we can apply is limited (to avoid comb filtering) [7]. Side-content crossfeed resolves this by only crossfeeding unique content between the two channels, allowing us to use larger delays. In this system, the side-content is extracted by using a stereo-to-3 upmixer, which is implemented as a novel extension to Nikunen et al.’s upmixer [8].

These systems were put to the test by conducting a subjective evaluation with 28 participants, all between 18 to 29 years of age. The participants were introduced to the metrics that were being measured in the beginning of the evaluation. Since the first part of the evaluation included specific spatial metrics which are a bit complicated to grasp for untrained listeners, we used a collection of descriptions, diagrams, and/or music excerpts that represented each metric to provide in-evaluation training for the listeners. The results of the first part of the evaluation suggest that this method worked well.
We were able to conclude from the results that Type-II externalized the sounds while performing at a level analogous to the original source in the other metrics and Type-0 was able to improve sound quality and comfort by compromising stereo width when compared to the original source, which is what we expected. Also, there was strong content-dependence observed in the results suggesting that a different setting of improving spatialization must be used with music that’s been produced differently. Overall, two of the three proposed systems in this work are preferred in equal or greater amounts to the legacy stereo mix.

Tags: music, acoustics, design, technology

References

[1] G-Sonique, “Monitor MSX5 – Headphone monitoring system,” G-Sonique, 2011. [Online]. Available: http://www.g-sonique.com/msx5headphonemonitoring.html.
[2] S. Mushendwa, “Enhancing Headphone Music Sound Quality,” Aalborg University – Institute of Media Technology and Engineering Science, 2009.
[3] C. J. C. H. K. K. Y. J. L. Yong Guk Kim, “An Integrated Approach of 3D Sound Rendering,” Springer-Verlag Berlin Heidelberg, vol. II, no. PCM 2010, p. 682–693, 2010.
[4] D. Rocchesso, “3D with Headphones,” in DAFX: Digital Audio Effects, Chichester, John Wiley & Sons, 2002, pp. 154-157.
[5] P. E. Roos, “Samplicity’s Bricasti M7 Impulse Response Library v1.1,” Samplicity, [Online]. Available: http://www.samplicity.com/bricasti-m7-impulse-responses/.
[6] R. O. Duda, “3-D Audio for HCI,” Department of Electrical Engineering, San Jose State University, 2000. [Online]. Available: http://interface.cipic.ucdavis.edu/sound/tutorial/. [Accessed 15 4 2015].
[7] J. Meier, “A DIY Headphone Amplifier With Natural Crossfeed,” 2000. [Online]. Available: http://headwize.com/?page_id=654.
[8] J. Nikunen, T. Virtanen and M. Vilermo, “Multichannel Audio Upmixing by Time-Frequency Filtering Using Non-Negative Tensor Factorization,” Journal of the AES, vol. 60, no. 10, pp. 794-806, October 2012.

5aMU1 – Understanding timbral effects of multi-resonator/generator systems of wind instruments in the context of western and non-western music

Popular version of poster 5aMU1
Presented Friday morning, May 22, 2015, 8:35 AM – 8:55 AM, Kings 4
169th ASA Meeting, Pittsburgh

In this paper the relationship between musical instruments and the rooms they are performed in was investigated. A musical instrument is typically characterized as a system that consists of a tone generator combined with a resonator. A saxophone for example has a reed as a tone generator and a comical shaped resonator that can be effectively changed in length with keys to produce different musical notes. Often neglected is the fact that there is a second resonator for all wind instruments coupled to the tone generator – the vocal cavity. We use our vocal cavity everyday when we speak to form characteristic formants, local enhancements in frequency to shape vowels. This is achieved by varying the diameter of the vocal tract at specific local positions along its axis. In contrast to the resonator of a wind instrument, the vocal tract is fixed its length by the dimensions between the vocal chords and the lips. Consequently, the vocal tract cannot be used to change the fundamental frequency over a larger melodic range. For out voice, the change in frequency is controlled via the tension of the vocal chords. The musical instrument’s instrument resonator however is not an adequate device to control the timbre (harmonic spectrum) of an instrument because it can only be varied in length but not in width. Therefore, the players adjustment of the vocal tract is necessary to control the timbre if the instrument. While some instruments posses additional mechanisms to control timbre, e.g., via the embouchure to control the tone generator directly using the lip muscles, for others like the recorder changes in the wind supply provided by the lungs and the changes of the vocal tract. The role of the vocal tract has not been addressed systematically in literature and learning guides for two obvious reasons. Firstly, there is no known systematic approach of how to quantify internal body movements to shape the vocal tract. Each performer has to figure out the best vocal tract configurations in an intuitive manner. For the resonator system, the changes are described through the musical notes, and in cases where multiple ways exist to produce the same note, additional signs exist to demonstrate how to finger this note (e.g., by providing a specific key combination). Secondly, in western classic music culture the vocal tract adjustments predominantly have a correctional function to balance out the harmonic spectrum to make the instrument sound as even as possible across the register.

Braasch2

PVC-Didgeridoo adapter for soprano saxophone

In non-western cultures, the role of the oral cavity can be much more important to convey musical meaning. The didgeridoo, for example, has a fixed resonator with no keyholes and consequently it can only produce a single pitched drone. The musical parameter space is then defined by modulating the overtone spectrum above the tone by changing the vocal tract dimensions and creating vocal sounds on top of the buzzing lips on the didgeridoo edge. Mouthpieces of Western brass instruments have a cup behind the rim with a very narrow opening to the resonator, the throat. The didgeridoo does not have a cup, and the rim is the edge of the resonator with a ring of bee wax. While the narrow throat of western mouthpiece mutes additional sounds produced with the voice, didgeridoos are very open from end to end and carry the voice much better.

The room, a musical instrument is performed in acts as a third resonator, which also affect the timbre of the instrument. In our case, the room was simulated using a computer model with early reflections and late reverberation.

Braasch 1 - wind instruments

Tone generators for soprano saxophone from left to right: Chinese Bawu, soprano saxophone, Bassoon reed, cornetto.

In general, it is difficult to assess the effect of a mouthpiece and resonator individually, because both vary across instruments. The trumpet for example has a narrow cylindrical bore with a brass mouthpiece, the saxophone has a wide conical bore with reed-based mouthpiece. To mitigate this effect, several tone generators were adapted for a soprano saxophone, including a brass mouthpiece from a cornetto, a bassoon mouthpiece and a didgeridoo adapter made from a 140 cm folded PCV pipe that can be attached to the saxophone as well. It turns out that the exchange of tone generators change the timbre of the saxophone significantly. The cornetto mouthpiece gives the instrument a much mellower tone. Similar to the baroque cornetto, the instruments sounds better in a bright room with lot of high frequencies, while the saxophone is at home at a 19th-century concert hall with a steeper roll off at high frequencies.

Share This