4aAA10 – Acoustic Effects of Face Masks on Speech: Impulse Response Measurements Between Two Head and Torso Simulators

Victoria Anderson – vranderson@unomaha.edu
Lily Wang – lilywang@unl.edu
Chris Stecker – cstecker@spatialhearing.org
University of Nebraska Lincoln at the Omaha Campus
1110 S 67th Street
Omaha, Nebraska

Popular version of 4aAA10 – Acoustic effects of face masks on speech: Impulse response measurements between two binaural mannikins
Presented Thursday morning, December 2nd, 2021
181st ASA Meeting
Click here to read the abstract

Due to the COVID-19 Pandemic, masks that cover both the mouth and nose have been used to reduce the spread of illness. While they are effective at preventing the transmission of COVID, they have also had a noticeable impact on communication. Many find it difficult to understand a speaker if they are wearing a mask. Masks effect the sound level and direction of speech, and if they are opaque, can block visual cues that help in understanding speech. There are many studies that explore the effect face masks have on understanding speech. The purpose of this project was to begin assembling a database of the effect that common face masks have on impulse responses from one head and torso simulator (HATS) to another. Impulse response is the measurement of sound radiating out from a source and how it bounces through a space. The resulting impulse response data can be used by researchers to simulate masked verbal communication scenarios.To see how the masks specifically effect the impulse response, all measurements were taken in an anechoic chamber so no reverberant noise would be included in the impulse response measurement. The measurements were taken with one HATS in the middle of the chamber to be used as the source, and another HATS placed at varying distances to act as the receiver. The mouth of the source HATS was covered with various face masks: paper, cloth, N95, nano, and face shield. These were put on individually and in combination with a face shield to get a wider range of potential masked combinations that would reasonably occur in real life. The receiver HATS took measurements at 90° and 45° from the source, at distances of 6’ and 8’. A sine sweep, which is a signal that changes frequency over a set amount of time, was played to determine the impulse response of each masked condition at every location. The receiver HATS measured the impulse response in both right and left ears, and the software used to produce the sine sweep was used to analyze and store the measurement data. This data will be available for use in simulated communication scenarios to better portray how sound would behave in a space when coming from a masked speaker.

masks masks head and torso simulator (HATS) masks

 

3aEA7 – Interactive Systems for Immersive Spaces

Samuel Chabot – chabos2@rpi.edu
Jonathan Mathews – mathej4@rpi.edu
Jonas Braasch – braasj@rpi.edu
Rensselaer Polytechnic Institute
110 8th St
Troy, NY, 12180

Popular version of 3aEA7 – Multi-user interactive systems for immersive virtual environments
Presented Wednesday morning, December 01, 2021
181st ASA Meeting
Click here to read the abstract

In the past few years, immersive spaces have become increasingly popular. These spaces, most prevalently used as exhibits and galleries, incorporate large displays that completely envelop groups of people, speaker arrays, and even reactive elements that can respond to the actions of the visitors within. One of the primary challenges in creating productive applications for these environments is the integration of intuitive interaction frameworks. For users to take full advantage of these spaces, whether it be for productivity, or education, or entertainment, the interfaces used to interact with data should be both easy to understand, and provide predictable feedback. In the Collaborative Research-Augmented Immersive Virtual Environment, or CRAIVE-Lab, at Rensselaer Polytechnic Institute, we have integrated a variety of technologies to foster natural interaction with the space. First, we developed a dynamic display environment for our immersive screen, written in JavaScript, to easily create display modules for everything from images to remote desktops. Second, we have incorporated spatial information into these display objects, so that audiovisual content presented on the screen generates spatialized audio over our 128-channel speaker array at the corresponding location. Finally, we have a multi-sensor platform installed, which integrates a top-down camera array, as well as a 16-channel spherical microphone to provide continuous tracking of multiple users, voice activity detection associated with each user, and isolated audio.

By combining these technologies together, we can create a user experience within the room that encourages dynamic interaction with data. For example, delivering a presentation in this space, a process that typically consists of several file transfers and a lackluster visual experience, can now be performed with minimal setup, using the presenter’s own device, and with spatial audio when needed.

Control of lights and speakers can be done via a unified control system. Feedback from the sensor system allows display elements to be positioned relative to the user. Identified users can take ownership of specific elements on the display, and interact with the system concurrently, which makes group interactions and shared presentations far less cumbersome than with typical methods. The elements which make up the CRAIVE-Lab are not particularly novel, as far as contemporary immersive rooms are concerned. However, these elements intertwine into a network which provides functionality for the occupants that is far greater than the sum of its parts.

2pAB4 – Towards understanding how dolphins use sound to understand their environment

YeonJoon Cheong – yjcheong@umich.edu
K. Alex Shorter – kshorter@umich.edu
Bogdan-Ioan Popa – bipopa@umich.edu
University of Michigan, Ann Arbor
2350 Hayward St
Ann Arbor, MI 48109-2125

Popular version of 2pAB4 – Acoustic scene modeling for echolocation in bottlenose dolphin
Presented Tuesday Morning, November 30, 2021
181st ASA Meeting
Click here to read the abstract

Dolphins are excellent at using ultrasound to discover their surroundings and find hidden objects. In a process called echolocation, dolphins project outgoing ultrasound pulses called clicks and receive echoes from distant objects, which are converted into a model of the surroundings. Despite significant research on echolocation, how dolphins process echoes to find objects in cluttered environments, and how they adapt their searching strategy based on the received echoes are still open questions.

Fig. 1. A target discrimination task where the dolphin finds and touches the target of interest. During the experiment the animal was asked to find a target shape in the presence of up to three additional “distraction” objects randomly placed in four locations (red dashed locations). The animal was blindfolded using “eye-cups”, and data from the trials were collected using sound (Dtag) and motion recording tags (MTag) on the animal, overhead video, and acoustic recorders at the targets.

Here we developed a framework that combines experimental measurements and physics-based models of the acoustic source and environments to provide new insight into echolocation. We conducted echolocation experiments at Dolphin Quest Oahu, Hawaii, which consisted of two stages. In the first stage, a dolphin was trained to search for a designated target using both vision and sound. In the second stage, the dolphin was asked to find the designated target placed randomly in the environment in the presence of distraction objects while “blind-folded” using suction cups, Fig. 1. After each trial, the dolphin was rewarded with a fish if it selected the correct target.
Target discrimination tasks have been used by many research groups to investigate echolocation. Interesting behavior has been observed during these tasks. For example, animals sometimes swim from object to object, carefully inspecting them before making a decision. Other times they swim without hesitation straight to the target. These types of behavior are often characterized using measurements of animal acoustics and movement, but how clutter in the environment changes the difficulty of the discrimination task or how much information the animals gather about the acoustic scene before target selection are not fully understood.
Our approach assumes that the dolphins memorize target echoes from different locations in the environment during training. We hypothesize that in a cluttered environment the dolphin selects the object that best matches the learned target echo signature, even if it is not an exact match. Our framework enables the calculation of a parameter that quantifies how well a received echo matches the learned echo, called the “likelihood parameter”. This parameter was used to build a map of the most likely target locations in the acoustic scene.

During the experiments, the dolphin swam to and investigated positions in the environment with high predicted target likelihood, as estimated by our approach. When the cluttered scene resulted in multiple objects with high likelihood values, the animal was observed to move towards and scan those areas to collect information before the decision. In other scenarios, the computed likelihood parameter was large at only one position, which explained why the animal swam to that position without hesitation. These results suggest that dolphins might create a similar “likelihood map” as information is gathered before target selection.
The proposed approach provides important additional insight into the acoustic scene formulated by echolocating dolphins, and how the animals use this evolving information to classify and locate targets. Our framework will lead to a more complete understanding of the complex perception procedure used by the echolocating animals.

3aSC7 – Human BeatBoxing: A Vocal Exploration

Alexis Dehais-Underdown – alexis-dehais-underdown@sorbonne-nouvelle.fr
Paul Vignes – vignes.paul@gmail.com
Lise Crevier-Buchman – lise.buchman1@gmail.com
Didier Demolin – didier.demolin@sorbonne-nouvelle.fr
Université Sorbonne-Nouvelle
13, rue de Santeuil
75005, Paris, FRANCE

Popular version of 3aSC7 – Human beatboxing: Physiological aspects of drum imitation
Presented Wednesday morning, December 1st, 2021
181st ASA Meeting, Seattle, Washington
Read the article in Proceedings of Meetings on Acoustics

We are interested in exploring the potential of the human vocal tract by understanding beatboxing production. Human Beatboxing (HBB) is a musical technique that uses the vocal tract to imitate musical instruments. Similar to languages like French or English, HBB relies on the combination of smaller units into larger ones. Unlike linguistic systems, HBB has no meaning: while we speak to be understood, beatboxers do not perform to be understood. Speech production obeys to linguistic constraints to ensure efficient communication, for example, the fact that each language have a finite number of vowels and consonants. This is not the case for HBB production because beatboxers use a larger number of sounds. We hypothesize that beatboxers acquire a more accurate and extended knowledge on physical capacities of the vocal tract that allows them to use a larger number of sounds.

Acquisition of laryngoscopic data (left) and acoustic & aerodynamic data (right)

We use 3 technics on 5 professional beatboxers : (1) aerodynamic recordings, (2) laryngoscopic recordings and (3) acoustic recordings. Aerodynamic data gives information about pressure and airflow changes that are the result of articulatory movements. Laryngoscopic images give a view of the different anatomical laryngeal structures and their role in beatboxing production. Acoustic data allows us to investigate the sound characteristics in terms of frequency and amplitude. We extracted 9 basic beatboxing sounds from our database: the classic kick drum and its humming variant, the closed hi-hat and its humming variant, the inward k-snare and its humming variant, the cough snare and the lips roll and its humming variant. Humming is a beatboxing strategy that allows simultaneous and independent articulation in the mouth and melodic voice production in the larynx. Some sounds are illustrated here :

The preliminary results are very interesting. While speech is mainly produced on an egressive airflow from the lungs (i.e. exhalation phase of breathing), HBB is not. We found a wide range of mechanisms to produce basic sounds. Mechanisms were described by where the airflow was set in motion (i.e. lungs, larynx, mouth) and by which direction the airflow goes (i.e. in or out of the vocal tract). Sounds shows different combinations of airflow location and direction :
• buccal egressive (humming classic kick and closed hi-hat) and ingressive (humming k-snare and lips roll)
• pulmonic egressive (cough snare) and ingressive sounds (classic inward k-snare and lips roll),
• laryngeal egressive (classic kick drum and closed hi-hat) and ingressive (classic k-snare and inward classic kick drum).

A same sound may be produced differently by different beatboxers but may sound perceptually similar. HBB displays high pressure values that suggests these mechanisms are more powerful than speech ones in a quiet conversation.

In the absence of linguistic constraints, artists are exploiting the vocal tract capacities more freely. It raises several questions about how they reorganize the respiratory activity, how they coordinate sounds together and how beatboxers avoid lesions or damages of the vocal tract structures. Our research project will produce further analysis on the description and coordination of beatboxing sounds at different speed rates based on MRI, Laryngoscopic, Aerodynamic and Acoustic data.

____________________

See also: Alexis Dehais-UnderdownPaul VignesLise Crevier-Buchman, and Didier Demolin, “In and out: production mechanisms in Human Beatboxing”, Proc. Mtgs. Acoust. 45, 060005 (2021) https://doi.org/10.1121/2.0001543

2aCA11-Validating a phase-inversion procedure to assess the signal-to-noise ratios at the output of hearing aids with wide-dynamic-range compression

Donghyeon Yun1 – dongyun@iu.edu
Yi Shen2 – shenyi@uw.edu
Jennifer J Lentz1 – jjlentz@indiana.edu

1. Department of Speech, Language and Hearing Sciences, Indiana University Bloomington,
2631 East Discovery Parkway Bloomington, IN 47408
2. Department of Speech and Hearing Sciences, University of Washington,
1417 Northeast 42nd Street, Seattle, WA 98105-6246

Popular version of 2aCA11 – Measuring hearing aid compression algorithm preference with the Tympan
Presented at the 181st ASA Meeting
Click here to read the abstract

Speech understanding is challenging in background noise, especially for listeners with hearing loss. Although the use of hearing aids may be able to compensate for the loss of hearing sensitivity by amplifying incoming sounds, the target speech and background noise are often amplified together. In this way, hearing aids do not “boost” the signal with respect to the noise. Although hearing aids will make the sounds louder, common processing in these devices may even make the signal smaller relative to the noise. This is because the techniques used to boost soft sounds but not loud ones are nonlinear in nature. The amount of the signal relative to the noise is called the Signal to Noise Ratio, or the SNR. A lower SNR at the output of a hearing aid may make speech understanding more difficult. Thus, it is important to accurately assess the output SNR when prescribing hearing aids in an audiology clinic.

——————–  The phase-inversion technique —————

In this paper, we looked to see whether a specific technique used to determine the SNR at the output of a hearing aid gave accurate results. In this phase-inversion technique, the hearing aid’s response to a target speech sound (S) embedded in background noise (N) is recorded. We also collect responses with an “inverted” signal (S’) and an “inverted” noise (N’). By using these inverted signals, we can calculate the SNR at the output of the hearing aid.
It has been difficult to determine whether this technique gives an accurate estimate of SNR because there is no way to calculate the true SNR at the output of a hearing aid. However, we can do this with a simulated hearing aid. In the current study, we calculated true output SNR using the hearing aid simulation for a number of test conditions. We then compared these true values to values estimated using the phase-inversion technique under the same test conditions. The test conditions included: (1) various SNRs at the input of the simulated hearing aid, (2) hearing-aid configurations fitted to four typical profiles of hearing loss, (3) two types of background noise (two- and twenty-talker babble noises), and (4) various parameters of the nonlinear processing algorithm.

——————- The output SNRs estimated using the phase-inversion technique (symbols) agree well with the actual output SNRs (symbols) ——————-

In agreement with previous studies, the output SNR for the simulated hearing aid was different from the input SNR, and this mismatch between the output and input SNRs depended on the test condition. The differences between the actual and estimated output SNRs were very small, indicating satisfactory validity for the phase-inversion technique.

 

4aMU8 – Neural Plasticity for Music Processing in Young Adults: the Effect of Transcranial Direct Current Stimulation (tDCS)

Eghosa Adodo, Cameron Patterson, Yan H. Yu
Corresponding: yuy1@stjohns.edu
St. John’s University
8000 Utopia Parkway, Queens, New York, 11439

Popular version of 4aMU8 – Neural plasticity for music processing in young adults: The effect of transcranial direct current stimulation (tDCS)
Presented Thursday morning, December 2, 2021
181st ASA Meeting
Click here to read the abstract

Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation technique. It has increasingly been proposed and utilized as a unique approach to enhance various communicative, cognitive, and emotional functions. However, it is not clear whether, how, and to what extent, tDCS influences nonlinguistic processing such as music processing. The purpose of this study was to examine brain responses to music as a result of noninvasive brain stimulation.

Twenty healthy young adults participated our study. They first sat in a sound-shielded booth, and listened to classic western piano music while watching a muted movie. The music stream used in this study consisted of six types of music pattern changes (rhythm, intensity, slide, location, pitch, and timbre), and it lasted 14 minutes. Brain waves were recorded using a 65-electrode sensor cap. Then each participant received 10 minutes of tDCS at the frontal-central scalp regions. After 10 minutes of tDCS, they listened to the music again while their brain waves were recorded again.

Multi-feature music oddball paradigm. (Permission to use the stimuli and paradigm was obtained from the original creator, Peter Vuust).
S = same sounds, D1= pitch change; D2 = timbre change; D3 = location change; D4 = intensity change, D5 = pitch slide change; D6 = rhythm change.

Electroencephalogram/event-related potentials Transcranial direct current stimulation

Transcranial Direct Current Stimulation (tDCS)

We hypothesized that 10 minutes of tDCS would enhance music processing.

Our results indicated that the differences between pre- and post-tDCS brain waves were only evident in some conditions. Noninvasive brain stimulation, including tDCS, has the potential to be used as a clinical tool for enhancing auditory processing, but further studies need to examine how experimental parameters (dosage, duration, frequency, etc) influence the brain responses for auditory processing.