4aAA10 – Acoustic Effects of Face Masks on Speech: Impulse Response Measurements Between Two Head and Torso Simulators

Victoria Anderson – vranderson@unomaha.edu
Lily Wang – lilywang@unl.edu
Chris Stecker – cstecker@spatialhearing.org
University of Nebraska Lincoln at the Omaha Campus
1110 S 67th Street
Omaha, Nebraska

Popular version of 4aAA10 – Acoustic effects of face masks on speech: Impulse response measurements between two binaural mannikins
Presented Thursday morning, December 2nd, 2021
181st ASA Meeting
Click here to read the abstract

Due to the COVID-19 Pandemic, masks that cover both the mouth and nose have been used to reduce the spread of illness. While they are effective at preventing the transmission of COVID, they have also had a noticeable impact on communication. Many find it difficult to understand a speaker if they are wearing a mask. Masks effect the sound level and direction of speech, and if they are opaque, can block visual cues that help in understanding speech. There are many studies that explore the effect face masks have on understanding speech. The purpose of this project was to begin assembling a database of the effect that common face masks have on impulse responses from one head and torso simulator (HATS) to another. Impulse response is the measurement of sound radiating out from a source and how it bounces through a space. The resulting impulse response data can be used by researchers to simulate masked verbal communication scenarios.To see how the masks specifically effect the impulse response, all measurements were taken in an anechoic chamber so no reverberant noise would be included in the impulse response measurement. The measurements were taken with one HATS in the middle of the chamber to be used as the source, and another HATS placed at varying distances to act as the receiver. The mouth of the source HATS was covered with various face masks: paper, cloth, N95, nano, and face shield. These were put on individually and in combination with a face shield to get a wider range of potential masked combinations that would reasonably occur in real life. The receiver HATS took measurements at 90° and 45° from the source, at distances of 6’ and 8’. A sine sweep, which is a signal that changes frequency over a set amount of time, was played to determine the impulse response of each masked condition at every location. The receiver HATS measured the impulse response in both right and left ears, and the software used to produce the sine sweep was used to analyze and store the measurement data. This data will be available for use in simulated communication scenarios to better portray how sound would behave in a space when coming from a masked speaker.

masks masks head and torso simulator (HATS) masks

 

3aSC7 – Human BeatBoxing: A Vocal Exploration

Alexis Dehais-Underdown – alexis-dehais-underdown@sorbonne-nouvelle.fr
Paul Vignes – vignes.paul@gmail.com
Lise Crevier-Buchman – lise.buchman1@gmail.com
Didier Demolin – didier.demolin@sorbonne-nouvelle.fr
Université Sorbonne-Nouvelle
13, rue de Santeuil
75005, Paris, FRANCE

Popular version of 3aSC7 – Human beatboxing: Physiological aspects of drum imitation
Presented Wednesday morning, December 1st, 2021
181st ASA Meeting, Seattle, Washington
Read the article in Proceedings of Meetings on Acoustics

We are interested in exploring the potential of the human vocal tract by understanding beatboxing production. Human Beatboxing (HBB) is a musical technique that uses the vocal tract to imitate musical instruments. Similar to languages like French or English, HBB relies on the combination of smaller units into larger ones. Unlike linguistic systems, HBB has no meaning: while we speak to be understood, beatboxers do not perform to be understood. Speech production obeys to linguistic constraints to ensure efficient communication, for example, the fact that each language have a finite number of vowels and consonants. This is not the case for HBB production because beatboxers use a larger number of sounds. We hypothesize that beatboxers acquire a more accurate and extended knowledge on physical capacities of the vocal tract that allows them to use a larger number of sounds.

Acquisition of laryngoscopic data (left) and acoustic & aerodynamic data (right)

We use 3 technics on 5 professional beatboxers : (1) aerodynamic recordings, (2) laryngoscopic recordings and (3) acoustic recordings. Aerodynamic data gives information about pressure and airflow changes that are the result of articulatory movements. Laryngoscopic images give a view of the different anatomical laryngeal structures and their role in beatboxing production. Acoustic data allows us to investigate the sound characteristics in terms of frequency and amplitude. We extracted 9 basic beatboxing sounds from our database: the classic kick drum and its humming variant, the closed hi-hat and its humming variant, the inward k-snare and its humming variant, the cough snare and the lips roll and its humming variant. Humming is a beatboxing strategy that allows simultaneous and independent articulation in the mouth and melodic voice production in the larynx. Some sounds are illustrated here :

The preliminary results are very interesting. While speech is mainly produced on an egressive airflow from the lungs (i.e. exhalation phase of breathing), HBB is not. We found a wide range of mechanisms to produce basic sounds. Mechanisms were described by where the airflow was set in motion (i.e. lungs, larynx, mouth) and by which direction the airflow goes (i.e. in or out of the vocal tract). Sounds shows different combinations of airflow location and direction :
• buccal egressive (humming classic kick and closed hi-hat) and ingressive (humming k-snare and lips roll)
• pulmonic egressive (cough snare) and ingressive sounds (classic inward k-snare and lips roll),
• laryngeal egressive (classic kick drum and closed hi-hat) and ingressive (classic k-snare and inward classic kick drum).

A same sound may be produced differently by different beatboxers but may sound perceptually similar. HBB displays high pressure values that suggests these mechanisms are more powerful than speech ones in a quiet conversation.

In the absence of linguistic constraints, artists are exploiting the vocal tract capacities more freely. It raises several questions about how they reorganize the respiratory activity, how they coordinate sounds together and how beatboxers avoid lesions or damages of the vocal tract structures. Our research project will produce further analysis on the description and coordination of beatboxing sounds at different speed rates based on MRI, Laryngoscopic, Aerodynamic and Acoustic data.

____________________

See also: Alexis Dehais-UnderdownPaul VignesLise Crevier-Buchman, and Didier Demolin, “In and out: production mechanisms in Human Beatboxing”, Proc. Mtgs. Acoust. 45, 060005 (2021) https://doi.org/10.1121/2.0001543

2aSC1 – Testing invisible Participants: Conducting Behavioural Science online during the Pandemic

Prof Jennifer Rodd
Department of Experimental Psychology, University College London
j.rodd@ucl.ac.uk
@jennirodd

Popular version of paper 2aSC1 Collecting experimental data online: How to maintain data quality when you can’t see your participants
Presented at the 180th ASA meeting

In early 2020 many researchers across the world had to close up their labs and head home to help prevent further spread of coronavirus.

If this pandemic had arrived a few years earlier, these restrictions on testing human volunteers in person would have resulted in a near-complete shutdown of behavioural research. Fortunately, the last 10 years have seen rapid advances in the software needed to conduct behavioural research online (e.g., Gorilla, jsPsych) and researchers now have access to well regulated pools of paid participants (e.g., Prolific). This allowed the many researchers who had already switched to online data collection could to continue to collect data throughout the pandemic. In addition, many lab-based researchers, who may have been sceptical about online data collection made the switch to online experiments over the last year. Jo Evershed (Founder CEO of Gorilla Experiment Builder) reports that the number of participants who completed a task online using Gorilla nearly tripled between the first quarter of 2020 and the same time period in 2021.

But this rapid shift to online research is not without problems. Many researchers have well-founded concerns about the lack of experimental control that arises when we cannot directly observe our participants.

Based on 8 years of running behavioural research online, I encourage researchers to embrace online research, but argue that we must carefully adapt our research protocols to maintain high data quality.

I present a general framework for conducting online research. This requires researcher to explicitly specify how moving data collection online might negatively impact their data and undermine their theoretical conclusions.

  • Where are participants doing the experiment? Somewhere noisy or distracting? Will this make data noisy or introduce systematic bias?

online

  • What equipment are participants using? Slow internet connection? Small screen? Headphones or speakers? How might this impact results?

online

  • Are participants who they say they are? Why might they lie about their age or language background? Does this matter?

online

  • Can participants cheat on your task? By writing things down as they go, or looking up information on the internet?

online

I encourage researchers to take a ‘worst case’ approach and assume that some of the data they collect will inevitably be of poor quality. The onus is on us to carefully build in experiment-specific safeguards to ensure that poor quality data can be reliably identified and excluded from our analyses. Sometimes this can be achieved by pre-specifying specific performance criteria on existing tasks, but often it included creating new tasks to provide critical information about our participants and their behaviour. These additional steps must be take prior to data collection, and can be time-consuming, but are vital to maintain the credibility of data obtained using online methods.

1aSC2 – The McGurk Illusion

Kristin J. Van Engen – kvanengen@wustl.edu
Washington University in St. Louis
1 Brookings Dr.
Saint Louis, MO 63130

Popular version of paper 1aSC2 The McGurk illusion
Presented Tuesday morning, June 8, 2021
180th ASA Meeting, Acoustics in Focus

In 1976, Harry McGurk and John MacDonald published their now-famous article, “Hearing Lips and Seeing Voices.” The study was a remarkable demonstration of how what we see affects what we hear: when the audio for the syllable “ba” was presented to listeners with the video of a face saying “ga”, listeners consistently reported hearing “da”.

That original paper has been cited approximately 7500 times to date, and in the subsequent 45 years, the “McGurk effect” has been used in countless studies of audiovisual processing in humans. It is typically assumed that people who are more susceptible to the illusion are also better at integrating auditory and visual information. This assumption has led to the use of susceptibility to the McGurk illusion as a measure of an individual’s ability to process audiovisual speech.

However, when it comes to understanding real-world multisensory speech perception, there are several reasons to think that McGurk-style stimuli are poorly-suited to the task. Most problematic is the fact that McGurk stimuli rely on audiovisual incongruence that never occurs in real-life audiovisual speech perception. Furthermore, recent studies show that susceptibility to the effect does not actually correlate with performance on audiovisual speech perception tasks such as understanding sentences in noisy conditions. This presentation reviews these issues, arguing that, while the McGurk effect is a fascinating illusion, it is the wrong tool for understanding the combined use of auditory and visual information during speech perception.

2aSC8 – Tips for collecting self-recordings on smartphones

Valerie Freeman – valerie.freeman@okstate.edu
Oklahoma State University
042 Social Sciences & Humanities
Stillwater, OK 74078

Popular version of paper 2aSC8 Tips for collecting self-recordings on smartphones
Presented Wednesday morning, June 9, 2021
180th ASA Meeting, Acoustics in Focus

When the pandemic hit, researchers who were in the middle of collecting data with people in person had to find another way. Speech scientists whose data consists of audio recordings of people talking switched to remote methods like Zoom or asking people to record themselves on their phones. But this switch came with challenges. We’re used to recording people in our labs with expensive microphones, in quiet sound booths where we can control the background noise and how far away our talkers sit from the mic. We worried that the audio quality from smartphones or Zoom wouldn’t be good enough for the acoustic measures we take. So, we got creative. Some of us did tests to verify that phones and Zoom are okay for our most common measurements (Freeman & De Decker, 2021; Freeman et al., 2020), some devised ways to test people’s hardware before beginning, some delivered special equipment to participants’ homes, and others shifted their focus to things that didn’t require perfect audio quality.

A photo of professional recording equipment in a laboratory sound booth – how speech scientists usually make recordings.

For one study in the Sociophonetics Lab at Oklahoma State University, we switched to having people record themselves on their phones or computers, and three weeks later, we had 50 new recordings – compared to the 10 we’d recorded in person over three weeks pre-pandemic! The procedure was short and simple: fill out some demographics, start up a voice recording app, read some words and stories aloud, email me the recording, and get a $5 gift card.

Along the way, we learned some tricks to keep things running smoothly. We allowed people to use any device and app they liked, and our instructions included links to some user-friendly voice memo apps for people who hadn’t used one before. The instructions were easy to read on a phone, and there weren’t too many steps. The whole procedure took less than 15 minutes, and the little gift card helped. We asked participants to sit close to their device in a quiet room with carpet and soft furniture (to reduce echo) and no background talking or music. To make it easier for older folks, I offered extra credit to my classes to help relatives get set up, we included a link to print the words to read aloud, and we could even walk people through it over Zoom, so we could record them instead.

And it worked! We got over 100 good-quality recordings from people all over the state – and many of them never would have come to the lab on campus, making our study more representative of Oklahoma than if we’d done it all in person.

While this year has been challenging, the ways researchers have learned to use consumer technology to collect data remotely will be an asset even after the pandemic subsides. We can include more people who can’t come to campus, and researchers with limited resources can do more with less – both of which can increase the diversity and inclusiveness of scientific research.

self-recordings

An image of the Sociophonetics Lab logo

See more about the Sociophonetics Lab at sophon.okstate.edu.

1aSC1 – Untangling the link between working memory and understanding speech

Adam Bosen – adam.bosen@boystwon.org
Boys Town National Research Hospital
555 N. 30th St
Omaha, NE 68131

Popular version of paper 1aSC1 Reconsidering reading span as the sole measure of working memory in speech recognition research
Presented Tuesday morning, June 8th, 2021
180th ASA Meeting, Acoustics in Focus

Many patients with cochlear implants have difficulty understanding speech. Cochlear implants often do not convey all of the pieces of speech, so the patient often has to use their memory of what they heard to fill in the missing pieces. As a result, their ability to understand speech is correlated with their performance on working memory tests (O’Neill et al., 2019). Working memory is our ability to simultaneously remember some information while working on other information. For example, if you want to add 57 and 38 in your head you need to sum 7+8 and then hold the result in memory while you work on summing 50+30.

The reading span test is a common tool for measuring working memory. In this test, people see lists of alternating sentences and letters and must decide whether each sentence makes sense while simultaneously remembering the letters. The reading span test is important because it often predicts how well people with hearing loss can understand speech.

We do not know is why the reading span test is related to speech understanding. One idea is that the ability to simultaneously remember and work on interpreting what you heard is essential for understanding unclear speech. To test this idea, our lab asked young adults with normal hearing to try to understand unclear sentences. These sentences were mixed with two other people talking in the background and then processed to mimic the limited signal a cochlear implant provides.

[Vocoded Speech in Babble.mp3, An unclear recording of someone saying “If the farm is rented, the rent must be paid” with other people talking in the background.]

They also completed memory tests which do not require them to work on anything, such as remembering lists of spoken numbers (example) or words on a screen (example). These tests were as good as reading span at predicting how well these participants could understand unclear speech. This finding indicates that the reading span test is just one way to assess the parts of memory that relate to understanding speech. We conclude that the ability to simultaneously remember and work on information is not the only part of memory that helps us understand unclear speech.

We also tested older adults with cochlear implants on their ability to understand sentences and their ability to remember lists of numbers. Surprisingly, we did not find a relationship between remembering lists of numbers and understanding speech like we did in young adults with normal hearing. This finding indicates that age and/or hearing loss change which parts of working memory relate to understanding speech. Previous work suggests that some parts of working memory tend to decline with age, while others do not (Bopp & Verhaeghen, 2005; Oberauer, 2005). We conclude that further untangling the link between working memory and understanding speech requires measuring different parts of memory using multiple tests.

Bopp, K. L., & Verhaeghen, P. (2005). Aging and Verbal Memory Span: A Meta-Analysis. Journal of Gerontology, 60B(5), 223–233. https://doi.org/https://doi.org/10.1093/geronb/60.5.P223

O’Neill, E. R., Kreft, H. A., & Oxenham, A. J. (2019). Cognitive factors contribute to speech perception in cochlear-implant users and age-matched normal-hearing listeners under vocoded conditions. The Journal of the Acoustical Society of America, 146(1), 195–210. https://doi.org/10.1121/1.5116009

Oberauer, K. (2005). Control of the contents of working memory – A comparison of two paradigms and two age groups. Journal of Experimental Psychology: Learning Memory and Cognition, 31(4), 714–728. https://doi.org/10.1037/0278-7393.31.4.714