A virtual reality system to ‘test drive’ hearing aids in real-world settings

Matthew Neal – mathew.neal.2@louisville.edu
Instagram: @matthewneal32

Department of Otolaryngology and other Communicative Disorders
University of Louisville
Louisville, Kentucky 40208
United States

Popular version of 3pID2 – A hearing aid “test drive”: Using virtual acoustics to accurately demonstrate hearing aid performance in realistic environments
Presented at the 184 ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0018736

Many of the struggles experienced by patients and audiologists during the hearing aid fitting process stem from a simple difficulty: it is really hard to describe in words how something will sound, especially if you have never heard it before. Currently, audiologists use brochures and their own words to counsel a patient during the hearing aid purchase process, but a device often must be purchased first before patients can try them in their everyday life. This research project has developed virtual reality (VR) hearing aid demonstration software which allows patients to listen to what hearing aids will sound like in real-world settings, such as noisy restaurants, churches, and the places where they need devices the most. Using the system, patient can make more informed purchasing decisions and audiologists can program hearing aids to an individual’s needs and preferences more quickly.

This technology can also be thought of as a VR ‘test drive’ of wearing hearing aids, letting audiologists act as tour guides as patients try out features on a hearing aid. After turning a new hearing aid feature on, a patient will hear the devices update in a split second, and the audiologist can ask, “Was it better before or after the adjustment?” On top of getting device settings correct, hearing aid purchasers must also decide which ‘technology level’ they would like to purchase. Patients are given an option between three to four technology levels, ranging from basic to premium, with an added cost of around $1,000 per increase in level. Higher technology levels incorporate the latest processing algorithms, but patients must decide if they are worth the price, often without the ability to hear the difference. The VR hearing aid demonstration lets patients try out these different levels of technology, hear the benefits of premium devices, and decide if the increase in speech intelligibility or listening comfort is worth the added cost.

A patient using the demo first puts on a custom pair of wired hearing aids. These hearing aids are the same devices sold that are sold in audiology clinics, but their microphones have been removed and replaced with wires for inputs. The wires are connected back to the VR program running on a computer which simulates the audio in a given scene. For example, in the VR restaurant scene shown in Video 1, the software maps audio in a complex, noisy restaurant to the hearing aid microphones while worn by a patient. The wires send the audio that would have been picked up in the simulated restaurant to the custom hearing aids, and they process and amplify the sound just as they would in that setting. All of the audio is updated in real-time so that a listener can rotate their head, just as they might do in the real world. Currently, the system is being further developed, and it is planned to be implemented in audiology clinics as an advanced hearing aid fitting and patient counseling tool.

Video 1: The VR software being used to demonstrate the Speech in Loud Noise program on a Phonak Audeo Paradise hearing aid. The audio in this video is the directly recorded output of the hearing aid, overlaid with a video of the VR system in operation. When the hearing aid is switched to the Speech in Loud noise program on the phone app, it becomes much easier and more comfortable to listen to the frontal talker, highlighting the benefits of this feature in a premium hearing aid.

Virtual Reality Musical Instruments for the 21st Century

Rob Hamilton – hamilr4@rpi.edu
Twitter: @robertkhamilton

Rensselaer Polytechnic Institute, 110 8th St, Troy, New York, 12180, United States

Popular version of 1aCA3 – Real-time musical performance across and within extended reality environments
Presented at the 184 ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0018060

Have you ever wanted to just wave your hands to be able to make beautiful music? Sad your epic air-guitar skills don’t translate into pop/rock super stardom? Given the speed and accessibility of modern computers, it may come as little surprise that artists and researchers have been looking to virtual and augmented reality to build the next generation of musical instruments. Borrowing heavily from video game design, a new generation of digital luthiers is already exploring new techniques to bring the joys and wonders of live musical performance into the 21st Century.

Image courtesy of Rob Hamilton.

One such instrument is ‘Coretet’: a virtual reality bowed string instrument that can be reshaped by the user into familiar forms such as a violin, viola, cello or double bass. While wearing a virtual reality headset such as Meta’s Oculus Quest 2, performers bow and pluck the instrument in familiar ways, albeit without any physical interaction with strings or wood. Sound is generated in Coretet using a computer model of a bowed or plucked string called a ‘physical model’ driven by the motion of a performer’s hands and the use of their VR game controllers. And borrowing from multiplayer online games, Coretet performers can join a shared network server and perform music together.

Our understanding of music, and live musical performance on traditional physical instruments is tightly coupled to time, specifically the understanding that when a finger plucks a string, or a stick strikes a drum head, a sound will be generated immediately, without any delay or latency. And while modern computers are capable of streaming large amounts of data at the speed of light – significantly faster than the speed of sound – bottlenecks in the CPUs or GPUs themselves, or in the code designed to mimic our physical interactions with instruments, or even in the network connections that connect users and computers alike, often introduce latency, making virtual performances feel sluggish or awkward.

This research focuses on some common causes for this kind of latency and looks at ways that musicians and instrument designers can work around or mitigate these latencies both technically and artistically.

Coretet overview video: Video courtesy of Rob Hamilton.

4pNS2 – Use of virtual reality in designing and developing sonic environment for dementia care facilities

Arezoo Talebzadeh – arezoo.talebzadeh@UGent.be
Ph.D. Student
Ghent University
Tech Lane Ghent Science Park, 126, B-9052 Gent, Belgium

Popular version of 4pNS2 – Use of virtual reality in designing and developing soundscape for dementia care facilities
Presented in the afternoon of May 26, 2022
182nd ASA Meeting in Denver, Colorado
Click here to read the abstract

Sound is essential in making people aware of their environment; sound also helps in recognizing the time of the day. People with dementia have difficulties understanding and identifying their senses. The sonic environment can help them navigate through the space and realize the time; it can also reduce their agitation and anxiety. Care facilities and nursing homes, and long-term cares (LTC) usually have an unfamiliar acoustic environment for anyone new in the place. A well-designed soundscape can enhance the feeling of safety, elevate the mood and enrich the atmosphere. Designing the soundscape that fosters well-being for a person with dementia is challenging as mental disorders change one’s perception of space. Soundscape is the sonic environment as perceived by a person in context.

This research aims to enhance the soundscape experience during the design and development of care facilities by using Virtual Reality and defining the context during the process.

Walking through the space while hearing the soundscape demonstrates how sound helps spatial orientation and understanding of time. Specific rooms can have a unique sound dedicated to them to help residents find the location. Natural soundscape in the lounge or sounds of coffee brewing in the dining room during breakfast. Birds sound inside residents’ rooms during the morning to elevate their mood and help them start their day.

Sound is not visual (tangible); therefore, it is hard to examine and experience the design before implementation. Virtual Reality is a suitable tool for demonstrating sound augmentation and the outcome. By walking through the space and listening to the augmented sonic environment, caregivers and family members can participate during the design process as they are most familiar with the person with dementia and their interests. This method helps in evaluating the soundscape. People with dementia have a different mental model. Virtual Reality can help feature diverse mental models and sympathize with people with dementia.

5pSP6 – Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment

Joseph Esce – esce@hartford.edu
Eoin A King – eoking@hartford.edu
Acoustics Program and Lab
Department of Mechanical Engineering
University of Hartford
200 Bloomfield Avenue
West Hartford
CT 06119

Popular version of paper 5pSP6: “Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment”, presented Friday afternoon, November 9, 2018, 2:30 – 2:45pm, RATTENBURY A/B, ASA 176th Meeting/2018 Acoustics Week in Canada, Victoria, Canada.

Virtual RealityIntroduction
While visual graphics in Virtual Reality (VR) systems are very well developed, the manner in which acoustic environments and sounds may be recreated in a VR system is not. Currently, the standard procedure to represent sound in a virtual environment is to use a generic head related transfer function (HRTF), i.e. a user selects a generic HRTF from a library, with limited personal information. It is essentially a ‘best-guess’ representation of an individual’s perception of a sound source. This limits the accuracy of the representation of the acoustic environment, as every person has a HRTF that is unique to themselves.

What is a HRTF?
If you close your eyes and someone jangles keys behind your head, you will be able to identify the general location of the keys just from the sound you hear. A HRTF is a mathematical function that captures these transformations, and can be used to recreate the sound of those keys in a pair of headphones – so that it appears that the sound recording of the keys has a direction associated with it. However, everyone has vastly different ear and head shapes, therefore HRTFs are unique to each person. The objective of our work was to determine how the accuracy of sound localization in a VR world varies for different users, and how we can improve it.

Test procedure
In our tests, volunteers entered a VR world, which was essentially an empty room, and an invisible sound source made a short bursts of noise at various positions in the room. Volunteers were asked to point to the location of the sound source, and results were captured using the VR’s motion tracking system. Results were captured to the nearest millimeter. We tested three cases: 1) where volunteers were not allowed to move their head to assist in the localization, 2) where some slight head movements were allowed to assist in sound localization, and 3) where volunteers could turn around freely and ‘search’ (with their ears) for the sound source. The head movement was tracked by using the VR system to track the volunteer’s eye movement, and if the volunteer moved, the sound source was switched off.

We observed that the accuracy with which volunteers were able to localize the sound source varied significantly from person to person. There was significant error when volunteers’ head movements were restricted, but the accuracy significantly improved when people were able to move around and listen to the sound source. This suggests that the initial impression of a sounds location in a VR world is refined when the user can move their head to refine their search.

Future Work
We are currently analyzing our results in more detail to account for the different characteristics of each user (e.g. head size, size and shape of ear, etc). Further, we are aiming to develop the experimental methodology to use machine learning algorithms enabling each user to create a pseudo-personalized HRTF, which would improve the immersive experience for all VR users.

3aAA10 – Localization and externalization in binaural reproduction with sparse HRTF measurement grids

Zamir Ben-Hur – zami@post.bgu.ac.il
Boaz Rafaely – br@bgu.ac.il
Department of Electrical and Computer Engineering,
Ben-Gurion University of the Negev,
Beer-Sheva, 84105, Israel.

David Lou Alon – davidalon@fb.com
Ravish Mehra – ravish.mehra@oculus.com
Oculus & Facebook,
1 Hacker Way,
Menlo Park, CA 94025, USA.

Popular version of paper 3aAA10, “Localization and externalization in binaural reproduction with sparse HRTF measurement grids”.
Presented Wednesday morning, May 9, 2018, 11:40-11:55 AM,
175th ASA Meeting, Minneapolis.

High-quality spatial sound reproduction is important for many applications of virtual and augmented reality. Spatial audio gives the listener the sensation that sound arrives from the surrounding 3D space, leading to immersive virtual soundscapes. To create such a virtual sound scene with headphone listening, binaural reproduction technique is being used. A key component in binaural reproduction is the head-related transfer function (HRTF). An HRTF is a mathematical representation that describes how a listener’s head, ears, and torso affect the acoustic path originating from sound source’s direction into the ear canal [1]. HRTF set is typically measured for an individual in an anechoic chamber using an HRTF measurement system. Alternatively, a generic HRTF set is measured using a manikin. To achieve a realistic spatial audio experience, in terms of sound localization and externalization, high resolution personalized HRTF (P-HRTF) is necessary. Localization refers to the ability of presenting a sound at accurate locations in the 3D space. Externalization is the ability to perceive the virtual sound as coming from outside of the head, like real world environments.

Typical P-HRTF set is composed of several hundreds to thousands of source directions measured around a listener, using a procedure which requires expensive and specialized equipment and can take a long time to complete. This motivates the development of methods that require fewer spatial samples but still allow accurate reconstruction of the P-HRTF sets with high spatial resolution. Given only sparsely measured P-HRTF, it will be necessary to reconstruct directions that were not measured, which introduces interpolation error that may lead to poor spatial audio reproduction [2]. It is therefore important to better understand this interpolation error and its effect on spatial perception. If the error is too significant then a generic HRTF may be the preferred option over a sparse P-HRTF. Figure 1 presents an illustration of the question being answered in this study.

Figure 1. Illustration of the challenge of this paper.

Prior studies suggested to represent the HRTF in the spherical-harmonics (SH) domain. Using SH decomposition, it is possible to reconstruct high resolution P-HRTF from a low number of measurements [3,4]. When using SH representation, the reconstruction error can be caused by spatial aliasing and/or of SH series truncation [4,5,6]. Aliasing refer to loss of ability to represent high frequencies due to limited number of measurements. Truncation error refer to the order limit imposed on the SH representation which further limits the spatial resolution. With small number of measurements, both errors contribute to the overall reconstruction error.

In this study, the effect of sparse measurement grids on the reproduced binaural signal is perceptually evaluated through virtual localization and externalization tests under varying conditions.

Six adult subjects participated in the experiment. The experiment was performed with the Oculus Rift headset with a pair of floating earphones (see Fig. 2). These floating earphones enabled per-user headphone equalization for the study. A stimulus of 10 second band-passed filtered white noise (0.5-15 kHz) was played-back using real-time binaural reproduction system. The system allows reproduction of a virtual sound source in a given direction, using a specific HRTF set that was chosen according to the test condition. At each trial, the sound was played from a different direction, and the subject was instructed to point to this direction using a virtual laser pointer controlled by the subject’s head movement. Next, the participant was asked to report whether the stimulus was externalized or internalized.

Figure 2. The experiment setup, including a Rift headset and floating earphones.

We analyzed the localization results by means of angular errors. The angular errors were calculated as the difference between the perceptually localized position and the true target position. Figure 3 depicts the mean sound localization performance for different test conditions (Q, N), where Q is the number of measurements and N is the SH order. The figure shows averaged error across all directions (upper plot) and errors in azimuth and elevation (lower plots) separately. The externalization results were analyzed as average percentage of responses that the subjects marked as being externalized. Figure 4 shows the externalization results averaged across all directions and subjects.

The results demonstrate that high number of measurements leads to better localization and externalization performances, where most of the effect is in the elevation angles. Compared to the performance of a generic HRTF, P-HRTF with 121 measurements and SH order 10 achieves similar results. The results suggest that for achieving improved localization and externalization performance compare to a generic HRTF, at least 169 directional measurements are required.

binaural reproduction

Figure 3. Localization results of angular error (in degrees) for different conditions of (Q,N), where Q is the number of measurements and N is the SH order. Upper plot show the overall angular error, and lower plots show separate errors for azimuth and elevation.

Figure 4. Results of externalization performance.


[1] J. Blauert, “Spatial hearing: the psychophysics of human sound localization”. MIT press, 1997.

[2] P. Runkle, M. Blommer, and G. Wakefield, “A comparison of head related transfer function interpolation methods,” in Applications of Signal Processing to Audio and Acoustics, 1995., IEEE ASSP Workshop on. IEEE, 1995, pp. 88–91.

[3] M.J.Evans,J.A.Angus,andA.I.Tew,“Analyzing head-related transfer function measurements using surface spherical harmonics,” The Journal of the Acoustical Society of America, vol. 104, no. 4, pp. 2400–2411, 1998.

[4] G. D. Romigh, D. S. Brungart, R. M. Stern, and B. D. Simpson, “Efficient real spherical harmonic representation of head-related transfer functions,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 921–930, 2015.

[5] B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,” IEEE Transactions on Signal Processing, vol. 55, no. 3, pp. 1003–1010, 2007.

[6] A. Avni, J. Ahrens, M. Geier, S. Spors, H. Wierstorf, and B. Rafaely, “Spatial perception of sound fields recorded by spherical microphone arrays with varying spatial resolution,” The Journal of the Acoustical Society of America, vol. 133, no. 5, pp. 2711–2721, 2013.