What does a glass bottle and an ancient Indian flute have in common? Explorations in acoustic color

Ananya Sen Gupta – ananya-sengupta@uiowa.edu
Department of Electrical and Computer Engineering
University of Iowa
Iowa City, IA 52242
United States

Trevor Smith – trevor-smith@uiowa.edu

Panchajanya Dey – panchajanyadey@gmail.com
@panchajanya_official

Popular version of 5aMU4 – Exploring the acoustic color signature paterns of Bansuri, the traditional Indian bamboo flute using principles of the Helmholtz generator and geometric signal processing techniques
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAICA25&id=3848014

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

The Bansuri, the ancient Indian bamboo flute

 

More media files accessed here

Bansuri, the ancient Indian bamboo flute, is of rich historical, cultural and spiritual significance to South Asian musical heritage. It has been mentioned in ancient Hindu texts dating back centuries, sometimes millennia, and is still played all over India in classical, folk, movie songs, and other musical genres today. Made from a single bamboo reed, with seven finger holes (six are mostly played) and one blow-hole, the Bansuri carries the rich melody of wind whistling through the tropical woods. In terms of musical acoustics, the Bansuri essentially works as a composite Helmholtz resonator, also known as wind throb, with a cylindrical rather than spherical and partially open cavity. The cavity openings are through the finger holes that are open during playing, as well as the open end of the shaft. Helmholtz resonance refers to the phenomenon of air resonance in a cavity, an effect named after the German physicist Hermann von Helmholtz. The bansuri sound is created when the air going in through the blow-hole is trapped inside the cavity of the bamboo shaft, before it leaves primarily through the end of the bamboo shaft as well as the first open finger holes.

The longer the length of the effective air shaft, which depends on how many finger-holes are closed, the lower the fundamental resonant frequency. However, the acoustical quality of the bansuri is determined not only by the fundamental (lowest) frequency but also by the relative dominance of the harmonics (higher octaves). The different octaves (typical bansuri has a range of thee octaves) can be activated by the bansuri player by controlling the angle and “beam-width” of the blow, which significantly impacts the dynamics of the air pressure, vorticity and air flow. A direct blow into the blow-hole for any finger-hole combination activates the direct propagation mode, where the lowest octave is dominant. To hit the higher octaves of the same note, the flautist has to blow at an angle to activate the other modes of sound propagation, which proceeds through the air column as well as the wooden body of the bansuri.

The accompanying videos and images show a basic demonstration of the bansuri as a musical instrument by Panchajanya Dey, simple demonstrations of a glass bottle as a Helmholtz resonator, and exposition of how the acoustic color (shown in the figures) can be used to bridge interdisciplinary artists to create new forms of music.

Acoustic color is a popular data science tool that expresses the relative distribution of power across the frequency spectrum as a function time. Visually these are images with colormap (red=high, blue = low) representing the relative power between the harmonics of the flute, and a rising (or falling) curve within the acoustic color image indicates a rising (or falling) tone for a harmonic. For the bansuri, the harmonic structures exist as non-linear braid-like curves within the acoustic color image. The higher harmonics, which may contain useful melodic information, are often embedded against background noise that sounds like hiss, likely from mixing of airflow modes and irregular reed vibrations. However, some hiss is natural to the flute and filtering it out makes the music lose its authenticity. In the talk, we presented computational techniques based on harmonic filtering to separate the modes of acoustic propagation and sound production in the Bansuri, e.g. filtering out leakage due to mixing of modes. We also exposited how the geometric aspects of the acoustic color features (e.g. harmonic signatures) may be exploited to create a fluid feature dictionary. The purpose of this dictionary is to store the harmonic signatures of different melodic movements, without sacrificing the rigor of musical grammar, or the authentic earthy sound of the bansuri (e.g. some of the hiss is natural and supposed to be there). This fluid feature repository may be harnessed with large language models (LLM) or similar AI/ML architecture to enable machine interpretation of Indian classical music, create collaborative infrastructure to enable artists from different musical traditions to experiment with an authentic software testbed, among other exciting applications.

Why do Cochlear Implant Users Struggle to Understand Speech in Echoey Spaces?

Prajna BK – prajnab2@illinois.edu

University of Illinois Urbana-Champaign, Speech and Hearing Science, Champaign, IL, 61820, United States

Justin Aronoff

Popular version of 2pSPb4 – Impact of Cochlear Implant Processing on Acoustic Cues Critical for Room Adaptation
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAICA25&id=3867053

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Have you ever wondered how we manage to understand someone in echoey and noisy spaces? For people using cochlear implants, understanding speech in these environments is especially difficult—and our research aims to explore why.

Figure 1. Spectrogram of reverberant speech before (top) and after (bottom) Cochlear Implant processing

 

When sound is produced in a room, it reflects off surfaces and lingers—creating reverberation. Reflections of both target speech and background noise make understanding speech even more difficult. However, for listeners with typical hearing, the brain quickly adapts to these reflections through short-term exposure, helping separate the speech signal from the room’s acoustic “fingerprint.” This process, known as adaptation, relies on specific sound features: the reverberation tail (the lingering energy after the speech stops), reduced modulation depth (how much the amplitude of the speech varies), and increased energy at low frequencies. Together, these cues create temporal and spectral patterns that the brain can group as separate from the speech itself.

While typical-hearing listeners adapt, many cochlear implant (CI) users report extreme difficulty understanding speech in everyday places like restaurants, where background noise and sound reflections are common. Although cochlear implants have been remarkably effective in restoring access to sound and speech for people with profound hearing loss, they still fall short in complex acoustic environments. This study explores the nature of distortions introduced by cochlear implants to key acoustic cues that listeners with typical hearing use to adapt to reverberant rooms.

The study examined how cochlear implant signal processing affects these cues by analysing room impulse response signals before and after simulated CI processing. Two key parameters were manipulated: the input dynamic range (IDR), which determines how much of the incoming sound is preserved before compression and affects how soft and loud sounds are balanced in the delivered electric signal. The second parameter, the Logarithmic Growth Function (LGF), controls how sharply the sound is compressed at higher levels. A lower LGF results in more abrupt shifts in volume, which can distort fine details in the sound.

The results show that cochlear implant processing significantly alters the acoustic cues that support adaptation. Specifically, it reduces the fidelity with which modulations are preserved, shortens the reverberation tail, and diminishes the low-frequency energy typically added by reflections. Overall, this degrades the speech clarity index of the sound, which can contribute to CI users’ difficulty communicating in reflective spaces.

Further, increasing the IDR extended the reverberation tail but also reduced the clarity index by increasing the relative contribution of reverberant energy to the total energy. Similarly, lowering the LGF factor caused more abrupt energy changes in the reverberation tail, degrading modulation fidelity. Interestingly, it also led to a more gradual drop-off in low-frequency energy—highlighting a complex trade-off.

Together, these findings suggest that cochlear implant users may struggle in reverberant environments not only because of reflections but also because their devices alter or distort the acoustic regularities that enable room adaptation. Improving how cochlear implants encode these features could make speech more intelligible in real-world, echo-filled spaces.

Monitoring offshore construction with fiber optic sensing

William Jenkins – wfjenkins@ucsd.edu

Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92093, United States

Ying-Tsong Lin
Scripps Institution of Oceanography
University of California San Diego
La Jolla, CA 92093, USA

Wenbo Wu
Woods Hole Oceanographic Institution
Woods Hole, MA 02543, USA

Popular version of 2aAB7 – Integrating hydrophone data and distributed acoustic sensing for pile driving noise monitoring in offshore environments
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAICA25&id=3864105

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Photo by JJ Ying on Unsplash

Throughout recorded history, the sea has provided humanity with resources and access to global trade. The discovery of marine oil and gas reserves transformed offshore activity in the 20th Century, and today the growing demand for sustainable energy has led to the development of offshore wind energy. While these developments have brought economic benefits, they have also increased the potential for environmental impacts.

Animals in marine ecosystems have evolved to thrive in a world dominated by sound. While animals on land rely primarily on vision to navigate their environment, marine animals have adapted to a world where light is scarce and sound is abundant. Most notably, marine mammals such as whales and dolphins rely on sound for navigation, communication, and hunting, and there is a growing body of evidence that other species, such as fish and invertebrates, also use sound for these purposes. Monitoring the soundscape of the ocean is an important component of understanding the potential impacts of offshore activity on marine ecosystems.

Our study focuses on the 2023 construction of the Vineyard Wind project, an offshore wind farm located south of Martha’s Vineyard, Massachusetts. Wind farm construction often involves pile driving, which generates impulsive noise that can, in certain conditions, adversely affect marine life, though modern construction operations employ protocols designed to mitigate these effects. Construction operations are acoustically monitored to measure the affected soundscape, assess the effectiveness of noise mitigation, and identify marine mammal vocalizations in the area.

A spectrogram from a hydrophone shows pulses from pile driving (vertical striations) and vocalizations from a nearby fin whale (horizontal striations at 20 Hz) during the 2023 construction of the Vineyard Wind project.

Traditionally, acoustic monitoring is performed using hydrophones located in the vicinity of pile driving. Figure 1 shows a spectrogram of data collected by an array of four hydrophones deployed near the construction site. The spectrogram shows the amount of sound energy at different frequencies over time, with red colors indicating higher sound levels. In the data, the vertical lines indicate pile driving pulses. In the recording, vocalizations from a nearby fin whale are also present.

A fin whale surfaces near Greenland (image courtesy of Aqqa Rosing-Asvid – Visit Greenland, CC BY 2.0 via Wikimedia Commons).

In this study, we also utilize a nearby fiber optic cable that provides data connectivity to the Martha’s Vineyard Coastal Observatory operated by the Woods Hole Oceanographic Institution. The cable is capable of distributed acoustic sensing (DAS), a technology that uses laser light in fiber optic cables to measure vibrations along the length of the cable. DAS is a promising technology for marine monitoring, as it provides high-resolution data over long distances. An example of DAS data is shown in Figure 3, where signals from 100 channels are arranged vertically by distance along the cable. The vertical striations in the data indicate pile driving pulses traveling through the array.

Data from 100 channels of a distributed acoustic sensing (DAS) array at Martha’s Vineyard Coastal Observatory. Vertical striations are pules from pile driving arriving at the array.

These results suggest that DAS can detect and characterize pile driving noise, offering a complementary approach to traditional hydrophone arrays. The continuous nature of the fiber optic sensing allows us to monitor the entire construction process with unprecedented spatial resolution, revealing how acoustic energy propagates through various marine environments.

As offshore human activity continues to expand globally, integrating such innovative acoustic monitoring techniques will be crucial for environmentally responsible development of our ocean resources.

How Drones Use Sound to See and Map 3D Spaces

Hala Abualsaud – habualsa@ucsd.edu
LinkedIn: linkedin.com/in/hala-abualsaud
ECE, University of California San Diego
San Diego, California 92130
United States

Peter Gerstoft – pgerstoft@ucsd.edu.
ECE, University of California San Diego
San Diego, California 92130
United States

Popular version of 2aCA7 – Acoustic Simultaneous Localization and Mapping for Drone Navigation in Complex Environments
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me/appinfo.php?page=Session&project=ASAICA25&id=3867163&server=eppro01.ativ.me

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

When drones fly indoors, inside warehouses, tunnels, or disaster zones, they can’t rely on GPS or cameras to know where they are. Instead, we propose something different: a drone that “listens” to its surroundings to navigate and map the environment.

We developed a new system called acSLAM (acoustic Simultaneous Localization and Mapping) that uses sound to guide a drone in 3D space. Our drone carries three microphone arrays, arranged in triangles, along with a motion sensor called an IMU (inertial measurement unit). As the drone moves, it records sounds and small changes in movement. Using this information, it estimates its own position and finds where multiple sound sources are located at the same time.

To handle the complexity of real 3D motion (where rotations can easily become unstable), we represent the drone’s orientation using quaternions – a way of describing rotation that avoids problems like gimbal lock, where the drone would otherwise lose its sense of direction. Quaternions work better than traditional methods because they keep track of rotation smoothly and consistently, even during fast or complex motion. They don’t get tripped up by tricky angles or repeated turns, which helps the drone stay accurately oriented as it moves through 3D space.

Our system works by first listening for where sounds are coming from (their angle of arrival) and measuring time differences (time difference of arrival) between microphones. Combining these clues with the drone’s movement, acSLAM builds a map of where sounds are in the room, like where people are talking or where machines are running.

We use advanced filtering methods (particle filters for the drone’s movement and Extended Kalman filters for the sound sources) to make sense of noisy real-world data. The system updates itself every step of the way, refining the drone’s position and improving the map as it gathers more information.

In testing, we found that using multiple sound observations, instead of relying on just one dramatically improved the drone’s ability to localize itself and map sources accurately. Even when the drone made sharp turns or accelerated quickly, the system stayed reliable.

This approach has exciting applications: drones could someday explore collapsed buildings, find survivors after disasters, or inspect underground spaces — all by listening carefully to their environment, without needing light, cameras, or external signals.

In short, we taught a drone not just to hear but to think about what it hears, and fly smarter because of it.

Does Virtual Reality Match Reality? Vocal Performance Across Environments

Pasquale Bottalico – pb81@illinois.edu

University of Illinois, Urbana-Champaign
Champaign, IL 61820
United States

Carly Wingfield2, Charlie Nudelman1, Joshua Glasner3, Yvonne Gonzales Redman1,2

  1. Department of Speech and Hearing Science, University of Illinois, Urbana-Champaign
  2. School of Music University of Illinois Urbana-Champaign
  3. School of Graduate Studies, Delaware Valley University

Popular version of 2aAAa1 – Does Virtual Reality Match Reality? Vocal Performance Across Environments
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me/appinfo.php?page=Session&project=ASAICA25&id=3864198&server=eppro01.ativ.me

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Singers often perform in very different spaces than where they practice—sometimes in small, dry rooms and later in large, echoey concert halls. Many singers have shared that this mismatch can affect how they sing. Some say they end up singing too loudly because they can’t hear themselves well, while others say they hold back because the room makes them sound louder than they are. Singers have to adapt their voices to unfamiliar concert halls, and often they have very little rehearsal time to adjust.

While research has shown that instrumentalists adjust their playing depending on the room they are in, there’s been less work looking specifically at singers. Past studies have found that different rooms can change how singers use their voices, including how their vibrato (the small, natural variation in pitch) changes depending on the room’s echo and clarity.

At the University of Illinois, our research team from the School of Music and the Department of Speech and Hearing Science is studying whether virtual reality (VR) can help singers train for different acoustic environments. The big question: can a virtual concert hall give singers the same experience as a real one?

To explore this, we created virtual versions of three real performance spaces on campus (Figure 1).

Figure 1. 360 degree images of the three performance spaces investigated.

Singers wore open-backed headphones and a VR headset while singing into a microphone in a sound booth. As they sang, their voices were processed in real time to sound like they were in one of the real venues, and this audio was sent back to them through the headphones. In the Video (Video1), you can see a singer performing in the sound booth where the acoustic environments were recreated virtually. In the audio file (Audio1), you can hear exactly what the singer heard: the real-time, acoustically processed sound being sent back to their ears through the open-backed headphones.

Video 1. Singer performing in the virtual environment.

 

Audio 1. Example of real-time auralized feedback.

Ten trained singers performed in both the actual venues (Figure 2) and in virtual versions of those same spaces.

Figure 2. Singer performing in the rear environment.

We then compared how they sang and how they felt during each performance. The results showed no significant differences in how the singers used their voices or how they perceived the experience between real and virtual environments.

This is an exciting finding because it suggests that virtual reality could become a valuable tool in voice training. If a singer can’t practice in a real concert hall, a VR simulation could help them get used to the sound and feel of the space ahead of time. This technology could give students greater access to performance preparation and allow voice teachers to guide students through the process in a more flexible and affordable way.

Hear This! Transforming Health Care with Speech-to-Text Technology #ASA187

Hear This! Transforming Health Care with Speech-to-Text Technology #ASA187

Researchers study the importance of enunciation in medical text to speech software

Media Contact:
AIP Media
301-209-3090
media@aip.org

MELVILLE, N.Y., Nov. 21, 2024 – Speech-to-text programs are becoming more popular for everyday tasks like hands-free dictation, helping people who are visually impaired, and transcribing speech for those who are hard of hearing. These tools have many uses, and researcher Bożena Kostek from Gdańsk University of Technology is exploring how STT can be better used in the medical field. By studying how clear speech affects STT accuracy, she hopes to improve its usefulness for health care professionals.

“Automating note-taking for patient data is crucial for doctors and radiologists, as it gives the doctors more face-to-face time with patients and allows for better data collection,” Kostek says.

Enunciation may have a crucial role to play in the accuracy of medial record dictation. This image was created with DALL-E 2. Credit: Bozena Kostek

Kostek also explains the challenges they face in this work.

“STT models often struggle with medical terms, especially in Polish, since many have been trained mainly on English. Also, most resources focus on simple language, not specialized medical vocabulary. Noisy hospital environments make it even harder, as health care providers may not speak clearly due to stress or distractions.”

To tackle these issues, a detailed audio dataset was created with Polish medical terms spoken by doctors and specialists in areas like cardiology and pulmonology. This dataset was analyzed using an Automatic Speech Recognition model, technology that converts speech into text, for transcription. Several metrics, such as Word Error Rate and Character Error Rate, were used to evaluate the quality of the speech recognition. This analysis helps understand how speech clarity and style affect the accuracy of STT.

Kostek will present this data Thursday, Nov. 21, at 3:25 p.m. ET as part of the virtual 187th Meeting of the Acoustical Society of America, running Nov. 18-22, 2024.

“Medical jargon can be tricky, especially with abbreviations that differ across specialties. This is an even more difficult task when we refer to realistic hospital situations in which the room is not acoustically prepared.” Kostek said.

Currently, the focus is on Polish, but there are plans to expand the research to other languages, like Czech. Collaborations are being established with the University Hospital in Brno to develop medical term resources, aiming to enhance the use of STT technology in health care.

“Even though artificial intelligence is helpful in many situations, many problems should be investigated analytically rather than holistically, focusing on breaking a whole picture into individual parts.”

———————– MORE MEETING INFORMATION ———————–
​Main Meeting Website: https://acousticalsociety.org/asa-virtual-fall-2024/
Technical Program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAFALL24

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the virtual meeting and/or press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.