3aAA10 – Localization and externalization in binaural reproduction with sparse HRTF measurement grids


Zamir Ben-Hur – zami@post.bgu.ac.il
Boaz Rafaely – br@bgu.ac.il
Department of Electrical and Computer Engineering,
Ben-Gurion University of the Negev,
Beer-Sheva, 84105, Israel.

David Lou Alon – davidalon@fb.com
Ravish Mehra – ravish.mehra@oculus.com
Oculus & Facebook,
1 Hacker Way,
Menlo Park, CA 94025, USA.

Popular version of paper 3aAA10, “Localization and externalization in binaural reproduction with sparse HRTF measurement grids”.
Presented Wednesday morning, May 9, 2018, 11:40-11:55 AM,
175th ASA Meeting, Minneapolis.

High-quality spatial sound reproduction is important for many applications of virtual and augmented reality. Spatial audio gives the listener the sensation that sound arrives from the surrounding 3D space, leading to immersive virtual soundscapes. To create such a virtual sound scene with headphone listening, binaural reproduction technique is being used. A key component in binaural reproduction is the head-related transfer function (HRTF). An HRTF is a mathematical representation that describes how a listener’s head, ears, and torso affect the acoustic path originating from sound source’s direction into the ear canal [1]. HRTF set is typically measured for an individual in an anechoic chamber using an HRTF measurement system. Alternatively, a generic HRTF set is measured using a manikin. To achieve a realistic spatial audio experience, in terms of sound localization and externalization, high resolution personalized HRTF (P-HRTF) is necessary. Localization refers to the ability of presenting a sound at accurate locations in the 3D space. Externalization is the ability to perceive the virtual sound as coming from outside of the head, like real world environments.

Typical P-HRTF set is composed of several hundreds to thousands of source directions measured around a listener, using a procedure which requires expensive and specialized equipment and can take a long time to complete. This motivates the development of methods that require fewer spatial samples but still allow accurate reconstruction of the P-HRTF sets with high spatial resolution. Given only sparsely measured P-HRTF, it will be necessary to reconstruct directions that were not measured, which introduces interpolation error that may lead to poor spatial audio reproduction [2]. It is therefore important to better understand this interpolation error and its effect on spatial perception. If the error is too significant then a generic HRTF may be the preferred option over a sparse P-HRTF. Figure 1 presents an illustration of the question being answered in this study.

Figure 1. Illustration of the challenge of this paper.

Prior studies suggested to represent the HRTF in the spherical-harmonics (SH) domain. Using SH decomposition, it is possible to reconstruct high resolution P-HRTF from a low number of measurements [3,4]. When using SH representation, the reconstruction error can be caused by spatial aliasing and/or of SH series truncation [4,5,6]. Aliasing refer to loss of ability to represent high frequencies due to limited number of measurements. Truncation error refer to the order limit imposed on the SH representation which further limits the spatial resolution. With small number of measurements, both errors contribute to the overall reconstruction error.

In this study, the effect of sparse measurement grids on the reproduced binaural signal is perceptually evaluated through virtual localization and externalization tests under varying conditions.

Six adult subjects participated in the experiment. The experiment was performed with the Oculus Rift headset with a pair of floating earphones (see Fig. 2). These floating earphones enabled per-user headphone equalization for the study. A stimulus of 10 second band-passed filtered white noise (0.5-15 kHz) was played-back using real-time binaural reproduction system. The system allows reproduction of a virtual sound source in a given direction, using a specific HRTF set that was chosen according to the test condition. At each trial, the sound was played from a different direction, and the subject was instructed to point to this direction using a virtual laser pointer controlled by the subject’s head movement. Next, the participant was asked to report whether the stimulus was externalized or internalized.

Figure 2. The experiment setup, including a Rift headset and floating earphones.

We analyzed the localization results by means of angular errors. The angular errors were calculated as the difference between the perceptually localized position and the true target position. Figure 3 depicts the mean sound localization performance for different test conditions (Q, N), where Q is the number of measurements and N is the SH order. The figure shows averaged error across all directions (upper plot) and errors in azimuth and elevation (lower plots) separately. The externalization results were analyzed as average percentage of responses that the subjects marked as being externalized. Figure 4 shows the externalization results averaged across all directions and subjects.

The results demonstrate that high number of measurements leads to better localization and externalization performances, where most of the effect is in the elevation angles. Compared to the performance of a generic HRTF, P-HRTF with 121 measurements and SH order 10 achieves similar results. The results suggest that for achieving improved localization and externalization performance compare to a generic HRTF, at least 169 directional measurements are required.

binaural reproduction

Figure 3. Localization results of angular error (in degrees) for different conditions of (Q,N), where Q is the number of measurements and N is the SH order. Upper plot show the overall angular error, and lower plots show separate errors for azimuth and elevation.

Figure 4. Results of externalization performance.

References

[1] J. Blauert, “Spatial hearing: the psychophysics of human sound localization”. MIT press, 1997.

[2] P. Runkle, M. Blommer, and G. Wakefield, “A comparison of head related transfer function interpolation methods,” in Applications of Signal Processing to Audio and Acoustics, 1995., IEEE ASSP Workshop on. IEEE, 1995, pp. 88–91.

[3] M.J.Evans,J.A.Angus,andA.I.Tew,“Analyzing head-related transfer function measurements using surface spherical harmonics,” The Journal of the Acoustical Society of America, vol. 104, no. 4, pp. 2400–2411, 1998.

[4] G. D. Romigh, D. S. Brungart, R. M. Stern, and B. D. Simpson, “Efficient real spherical harmonic representation of head-related transfer functions,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 921–930, 2015.

[5] B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,” IEEE Transactions on Signal Processing, vol. 55, no. 3, pp. 1003–1010, 2007.

[6] A. Avni, J. Ahrens, M. Geier, S. Spors, H. Wierstorf, and B. Rafaely, “Spatial perception of sound fields recorded by spherical microphone arrays with varying spatial resolution,” The Journal of the Acoustical Society of America, vol. 133, no. 5, pp. 2711–2721, 2013.

2aNS – How virtual reality technologies can enable better soundscape design

W.M. To – wmto@ipm.edu.mo
Macao Polytechnic Institute, Macao SAR, China.
A. Chung – ac@smartcitymakter.com
Smart City Maker, Denmark.
B. Schulte-Fortkamp – b.schulte-fortkamp@tu-berlin.de
Technische Universität Berlin, Berlin, Germany.

Popular version of paper 2aNS, “How virtual reality technologies can enable better soundscape design”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu

The quality of life including good sound quality has been sought by community members as part of the smart city initiative. While many governments have placed special attention to waste management, air and water pollution, acoustic environment in cities has been directed toward the control of noise, in particular, transportation noise. Governments that care about the tranquility in cities rely primarily on setting the so-called acceptable noise levels i.e. just quantities for compliance and improvement [1]. Sound quality is most often ignored. Recently, the International Organization for Standardization (ISO) released the standard on soundscape [2]. However, sound quality is a subjective matter and depends heavily on the perception of humans in different contexts [3]. For example, China’s public parks are well known to be rather noisy in the morning due to the activities of boisterous amateur musicians and dancers – many of them are retirees and housewives – or “Da Ma” [4]. These activities would cause numerous complaints if they would happen in other parts of the world, but in China it is part of everyday life.

According to the ISO soundscape guideline, people can use sound walks, questionnaire surveys, and even lab tests to determine sound quality during a soundscape design process [3]. With the advance of virtual reality technologies, we believe that the current technology enables us to create an application that immerses designers and stakeholders in the community to perceive and compare changes in sound quality and to provide feedback on different soundscape designs. An app has been developed specifically for this purpose. Figure 1 shows a simulated environment in which a student or visitor arrives the school’s campus, walks through the lawn, passes a multifunctional court, and get into an open area with table tennis tables. She or he can experience different ambient sounds and can click an object to increase or decrease the volume of sound from that object. After hearing sounds at different locations from different sources, the person can evaluate the level of acoustic comfort at each location and express their feelings toward overall soundscape.  She or he can rate the sonic environment based on its degree of perceived loudness and its level of pleasantness using a 5-point scale from 1 = ‘heard nothing/not at all pleasant’ to 5 = ‘very loud/pleasant’. Besides, she or he shall describe the acoustic environment and soundscape using free words because of the multi-dimensional nature of sonic environment.

soundscape

Figure 1. A simulated soundwalk in a school campus.

  1. To, W. M., Mak, C. M., and Chung, W. L.. Are the noise levels acceptable in a built environment like Hong Kong? Noise and Health, 2015. 17(79): 429-439.
  2. ISO. ISO 12913-1:2014 Acoustics – Soundscape – Part 1: Definition and Conceptual Framework, Geneva: International Organization for Standardization, 2014.
  3. Kang, J. and Schulte-Fortkamp, B. (Eds.). Soundscape and the Built Environment, CRC Press, 2016.
  4. Buckley, C. and Wu, A. In China, the ‘Noisiest Park in the World’ Tries to Tone Down Rowdy Retirees, NYTimes.com, from http://www.nytimes.com/2016/07/04/world/asia/china-chengdu-park-noise.html , 2016.