3aAA10 – Localization and externalization in binaural reproduction with sparse HRTF measurement grids
Zamir Ben-Hur – zami@post.bgu.ac.il
Boaz Rafaely – br@bgu.ac.il
Department of Electrical and Computer Engineering,
Ben-Gurion University of the Negev,
Beer-Sheva, 84105, Israel.
David Lou Alon – davidalon@fb.com
Ravish Mehra – ravish.mehra@oculus.com
Oculus & Facebook,
1 Hacker Way,
Menlo Park, CA 94025, USA.
Popular version of paper 3aAA10, “Localization and externalization in binaural reproduction with sparse HRTF measurement grids”.
Presented Wednesday morning, May 9, 2018, 11:40-11:55 AM,
175th ASA Meeting, Minneapolis.
High-quality spatial sound reproduction is important for many applications of virtual and augmented reality. Spatial audio gives the listener the sensation that sound arrives from the surrounding 3D space, leading to immersive virtual soundscapes. To create such a virtual sound scene with headphone listening, binaural reproduction technique is being used. A key component in binaural reproduction is the head-related transfer function (HRTF). An HRTF is a mathematical representation that describes how a listener’s head, ears, and torso affect the acoustic path originating from sound source’s direction into the ear canal [1]. HRTF set is typically measured for an individual in an anechoic chamber using an HRTF measurement system. Alternatively, a generic HRTF set is measured using a manikin. To achieve a realistic spatial audio experience, in terms of sound localization and externalization, high resolution personalized HRTF (P-HRTF) is necessary. Localization refers to the ability of presenting a sound at accurate locations in the 3D space. Externalization is the ability to perceive the virtual sound as coming from outside of the head, like real world environments.
Typical P-HRTF set is composed of several hundreds to thousands of source directions measured around a listener, using a procedure which requires expensive and specialized equipment and can take a long time to complete. This motivates the development of methods that require fewer spatial samples but still allow accurate reconstruction of the P-HRTF sets with high spatial resolution. Given only sparsely measured P-HRTF, it will be necessary to reconstruct directions that were not measured, which introduces interpolation error that may lead to poor spatial audio reproduction [2]. It is therefore important to better understand this interpolation error and its effect on spatial perception. If the error is too significant then a generic HRTF may be the preferred option over a sparse P-HRTF. Figure 1 presents an illustration of the question being answered in this study.
Prior studies suggested to represent the HRTF in the spherical-harmonics (SH) domain. Using SH decomposition, it is possible to reconstruct high resolution P-HRTF from a low number of measurements [3,4]. When using SH representation, the reconstruction error can be caused by spatial aliasing and/or of SH series truncation [4,5,6]. Aliasing refer to loss of ability to represent high frequencies due to limited number of measurements. Truncation error refer to the order limit imposed on the SH representation which further limits the spatial resolution. With small number of measurements, both errors contribute to the overall reconstruction error.
In this study, the effect of sparse measurement grids on the reproduced binaural signal is perceptually evaluated through virtual localization and externalization tests under varying conditions.
Six adult subjects participated in the experiment. The experiment was performed with the Oculus Rift headset with a pair of floating earphones (see Fig. 2). These floating earphones enabled per-user headphone equalization for the study. A stimulus of 10 second band-passed filtered white noise (0.5-15 kHz) was played-back using real-time binaural reproduction system. The system allows reproduction of a virtual sound source in a given direction, using a specific HRTF set that was chosen according to the test condition. At each trial, the sound was played from a different direction, and the subject was instructed to point to this direction using a virtual laser pointer controlled by the subject’s head movement. Next, the participant was asked to report whether the stimulus was externalized or internalized.
We analyzed the localization results by means of angular errors. The angular errors were calculated as the difference between the perceptually localized position and the true target position. Figure 3 depicts the mean sound localization performance for different test conditions (Q, N), where Q is the number of measurements and N is the SH order. The figure shows averaged error across all directions (upper plot) and errors in azimuth and elevation (lower plots) separately. The externalization results were analyzed as average percentage of responses that the subjects marked as being externalized. Figure 4 shows the externalization results averaged across all directions and subjects.
The results demonstrate that high number of measurements leads to better localization and externalization performances, where most of the effect is in the elevation angles. Compared to the performance of a generic HRTF, P-HRTF with 121 measurements and SH order 10 achieves similar results. The results suggest that for achieving improved localization and externalization performance compare to a generic HRTF, at least 169 directional measurements are required.
References
[1] J. Blauert, “Spatial hearing: the psychophysics of human sound localization”. MIT press, 1997.
[2] P. Runkle, M. Blommer, and G. Wakefield, “A comparison of head related transfer function interpolation methods,” in Applications of Signal Processing to Audio and Acoustics, 1995., IEEE ASSP Workshop on. IEEE, 1995, pp. 88–91.
[3] M.J.Evans,J.A.Angus,andA.I.Tew,“Analyzing head-related transfer function measurements using surface spherical harmonics,” The Journal of the Acoustical Society of America, vol. 104, no. 4, pp. 2400–2411, 1998.
[4] G. D. Romigh, D. S. Brungart, R. M. Stern, and B. D. Simpson, “Efficient real spherical harmonic representation of head-related transfer functions,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 921–930, 2015.
[5] B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,” IEEE Transactions on Signal Processing, vol. 55, no. 3, pp. 1003–1010, 2007.
[6] A. Avni, J. Ahrens, M. Geier, S. Spors, H. Wierstorf, and B. Rafaely, “Spatial perception of sound fields recorded by spherical microphone arrays with varying spatial resolution,” The Journal of the Acoustical Society of America, vol. 133, no. 5, pp. 2711–2721, 2013.