Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Building large AI datasets can help experts provide faster, earlier diagnoses.

Media Contact:
AIP Media
301-209-3090
media@aip.org

PedzSTAR
pedzstarpr@mekkymedia.com

NEW ORLEANS, May 19, 2025 – Speech and language impairments affect over a million children every year, and identifying and treating these conditions early is key to helping these children overcome them. Clinicians struggling with time, resources, and access are in desperate need of tools to make diagnosing speech impairments faster and more accurate.

Marisha Speights, assistant professor at Northwestern University, built a data pipeline to train clinical artificial intelligence tools for childhood speech screening. She will present her work Monday, May 19, at 8:20 a.m. CT as part of the joint 188th Meeting of the Acoustical Society of America and 25th International Congress on Acoustics, running May 18-23.

Children at a childcare center. Credit: GETTY CC BY-SA

AI-based speech recognition and clinical diagnostic tools have been in use for years, but these tools are typically trained and used exclusively on adult speech. That makes them unsuitable for clinical work involving children. New AI tools must be developed, but there are no large datasets of recorded child speech for these tools to be trained on, in part because building these datasets is uniquely challenging.

“There’s a common misconception that collecting speech from children is as straightforward as it is with adults — but in reality, it requires a much more controlled and developmentally sensitive process,” said Speights. “Unlike adult speech, child speech is highly variable, acoustically distinct, and underrepresented in most training corpora.”

To remedy this, Speights and her colleagues began collecting and analyzing large volumes of child speech recordings to build such a dataset. However, they quickly realized a problem: The collection, processing, and annotation of thousands of speech samples is difficult without exactly the kind of automated tools they were trying to build.

“It’s a bit of a catch-22,” said Speights. “We need automated tools to scale data collection, but we need large datasets to train those tools.”

In response, the researchers built a computational pipeline to turn raw speech data into a useful dataset for training AI tools. They collected a representative sample of speech from children across the country, verified transcripts and enhanced audio quality using their custom software, and provided a platform that will enable detailed annotation by experts.

The result is a high-quality dataset that can be used to train clinical AI, giving experts access to a powerful set of tools to make diagnosing speech impairments much easier.

“Speech-language pathologists, health care clinicians and educators will be able to use AI-powered systems to flag speech-language concerns earlier, especially in places where access to specialists is limited,” said Speights.

——————— MORE MEETING INFORMATION ———————
Main Meeting Website: https://acousticalsociety.org/new-orleans-2025/
Technical Program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAICA25

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting and/or press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

ABOUT THE INTERNATIONAL COMMISSION FOR ACOUSTICS
The purpose of the International Commission for Acoustics (ICA) is to promote international development and collaboration in all fields of acoustics including research, development, education, and standardization. ICA’s mission is to be the reference point for the acoustic community, becoming more inclusive and proactive in our global outreach, increasing coordination and support for the growing international interest and activity in acoustics. Learn more at https://www.icacommission.org/.

Can You Hear Me Now? Fixing Speech Recognition Tech So It Works for Every Child

Vishal Shrivastava – shrivastava_vishal@outlook.com

Northwestern University, School of Communication, Department of Communication Sciences and Disorders, Frances Searle Building, 2240 Campus Drive, Evanston, Illinois, 60208-3550, United States

Marisha Speights, Akangkshya Pathak

Popular version of 1aCA3 – Inclusive automatic speech recognition: A framework for equitable speech recognition in children with disorders
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAICA25&id=3867184

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Imagine a child says, “My chest hurts,” but the computer hears “My test works.”
In critical moments, mistranscriptions like this can have serious consequences.

Today’s voice recognition tools—like those behind Siri, Alexa, or educational apps—work well for most adults, but often struggle with children’s voices—especially when speech is accented, disordered, or still developing.

We set out to change that by fine-tuning existing systems to ensure every child’s voice is heard clearly, fairly, and without bias.

The Problem: When Technology Leaves Children Behind
Automatic Speech Recognition (ASR) turns spoken words into text. It powers voice commands, transcription tools, and increasingly, educational apps and therapies. But there’s a hidden flaw: these systems are trained mostly on adult speech.

Here’s why that matters:

  • Children’s voices are different—higher-pitched, more variable, and constantly evolving.
  • There’s less data. Collecting labeled child speech—especially from children with disorders—is hard, costly, and ethically complex.
  • Bias creeps in. When systems hear mostly one kind of speech (like adult American English), they treat that as “normal.”

Everything else—like a 6-year-old with a stutter—gets mistaken for noise.

This isn’t just a technical problem. It’s an equity problem. The very tools meant to support children in learning, therapy, or communication often fail to understand them.

Our Approach: Teaching AI to Listen Fairly

Fine-tuning Whisper ASR with domain classifiers and gradient reversal layer

We fine-tuned OpenAI’s Whisper ASR to better understand how children speak—not just by adding more data, but by teaching it to focus on what matters. Like other speech models, Whisper doesn’t only learn the words being said; it also picks up on who is speaking—age, accent, gender, and speech disorders. These cues are baked into the audio, and because Whisper was trained mostly on clear adult speech, it often misinterprets child or disordered speech, treating it as noise.

To fix this, we added a second learning objective—imagine two students in training. One transcribes speech; the other tries to guess traits like the speaker’s age or gender, using only the first student’s notes. Now we challenge the first: transcribe accurately, but reveal nothing about who’s speaking. The better they hide those clues while getting the words right, the better they’ve learned.

That’s the heart of adversarial debiasing. During fine-tuning, we added a domain classifier—like the second student—trained to detect speaker traits from Whisper’s internal audio features. We then inserted a gradient reversal layer to make that job harder, forcing the encoder to scrub away identity cues. All the while, the model continued learning to transcribe—only now, it did so without relying on speaker-specific shortcuts.

Architecture of the end-to-end domain adversarial fine-tuning

The Result: Technology That Includes Every Voice
By learning to ignore traits that shouldn’t affect understanding—like age, accent, or disordered articulation—Whisper becomes more robust, accurate, and fair. It no longer gets tripped up by voices that don’t match what it was originally trained on. That means fewer errors for children who speak differently, and a step closer to voice technology that works for everyone—not just the majority.

Finding the Right Tools to Interpret Crowd Noise at Sporting Events with AI

Jason Bickmore – jbickmore17@gmail.com

Instagram: @jason.bickmore
Brigham Young University, Department of Physics and Astronomy, Provo, Utah, 84602, United States

Popular version of 1aCA4 – Feature selection for machine-learned crowd reactions at collegiate basketball games
Presented at the 188th ASA Meeting
Read the abstract at https://eppro01.ativ.me/appinfo.php?page=Session&project=ASAICA25&id=3868450&server=eppro01.ativ.me

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

A mixture of traditional and custom tools is enabling AI to make meaning in an unexplored frontier: crowd noise at sporting events.

The unique link between a crowd’s emotional state and its sound makes crowd noise a promising way to capture feedback about an event continuously and in real-time. Transformed into feedback, crowd noise would help venues improve the experience for fans, sharpen advertisements, and support safety.

To capture this feedback, we turned to machine learning, a popular strategy for making tricky connections. While the tools required to teach AI to interpret speech from a single person are well-understood (think Siri), the tools required to make sense of crowd noise are not.

To find the best tools for this job, we began with a simpler task: teaching an AI model to recognize applause, chanting, distracting the other team, and cheering at college basketball and volleyball games (Fig. 1).

Figure 1: Machine learning identifies crowd behaviors from crowd noise. We helped machine learning models recognize four behaviors: applauding, chanting, cheering, and distracting the other team. Image courtesy of byucougars.com.

We began with a large list of tools, called features, some drawn from traditional speech processing and others created using a custom strategy. After applying five methods to eliminate all but the most powerful features, a blend of traditional and custom features remained. A model trained with these features recognized the four behaviors with at least 70% accuracy.

Based on these results, we concluded that, when interpreting crowd noise, both traditional and custom features have a place. Even though crowd noise is not the situation the traditional tools were designed for, they are still valuable. The custom tools are useful too, complementing the traditional tools and sometimes outperforming them. The tools’ success at recognizing the four behaviors indicates that a similar blend of traditional and custom tools could enable AI models to navigate crowd noise well enough to translate it into real-time feedback. In future work, we will investigate the robustness of these features by checking whether they enable AI to recognize crowd behaviors equally well at events other than college basketball and volleyball games.

Walk to the Beat: How Your Playlist Can Shape Your Emotional Balance

Man Hei LAW – mhlawaa@connect.ust.hk

Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, -, -, Hong Kong

Andrew HORNER
Computer Science and Engineering
Hong Kong University of Science and Technology
Hong Kong

Popular version of 1aCA2 – Exploring the Therapeutic Effects of Emotion Equalization App During Daily Walking Activities
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0034927

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


During our daily tasks, we spend a lot of time getting things done. When walking, some people may find it boring and feel like time drags on. On the other hand, some see it as a chance to think and plan ahead. Our researchers believe that we can use this short period of time to help people rebalance their emotions. This way, individuals can feel refreshed and energized as they walk to their next destination.

Our idea is to provide each participant with a specific music playlist to listen to while walking. The playlists consisted of Uplifting, Relaxing, Angry, and Sad music, each lasting for 15 minutes. While our listeners were walking, they were using our Emotion Equalization App (Figures 1a to 1d) for accessing the playlist and collect all users’ data.

Figures 1a to 1d: The interface of the Emotion Equalization App

The key data we focused on was assessing the changes in emotions. To understand the listeners’ emotions, we used the Self-Assessment Manikin scale (SAM), a visual tool that helps depict emotions based on internal energy levels and mood positivity (refer to Figure 2). After the tests, we analyzed at how their emotions changed before and after listening to the music.

Figure 2: The Self-Assessment Manikin scale, showing energy levels at the top and mood positivity at the bottom [1]

The study found that the type of music influenced how far participants walked. Those listening to Uplifting music walked the farthest, followed by Angry, Relaxing, and Sad music. It was as expected that the music’s energy could affect the participants’ physical energy.

So, if music can affect physical energy, can it also have a positive effect on emotions? Can negative music help in mood regulation? An unexpected finding was that Angry music was found to be the most effective therapeutic music for walking. Surprisingly, listening to Angry music while walking not only elevated internal energy levels but also promoted positive feelings. On the other hand, Uplifting and Sad music only elicited positive emotions in listeners. However, Relaxing music during walking did not contribute to increased internal energy levels or positive feelings. This result breaks the impression on the therapeutic use of music while engaging in walking activities. Angry music has a negative vibe, but our study proved it to be beneficial in helping individuals relieve stress while walking, ultimately enhancing internal energy and mood.

If you’re having a tough day, consider listening to an Angry music playlist while taking a walk. It can help in balancing your emotions and uplifting your mood for your next activity.

[1] A. Mehrabian and J. A. Russell, An approach to environmental psychology. in An approach to environmental psychology. Cambridge, MA, US: The MIT Press, 1974, pp. xii, 266.

Listen In: Infrasonic Whispers Reveal the Hidden Structure of Planetary Interiors and Atmospheres

Quentin Brissaud – quentin@norsar.no
X (twitter): @QuentinBrissaud

Research Scientist, NORSAR, Kjeller, N/A, 2007, Norway

Sven Peter Näsholm, University of Oslo and NORSAR
Marouchka Froment, NORSAR
Antoine Turquet, NORSAR
Tina Kaschwich, NORSAR

Popular version of 1pPAb3 – Exploring a planet with infrasound: challenges in probing the subsurface and the atmosphere
Presented at the 186 ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0026837

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

infrasoundLow frequency sound, called infrasound, can help us better understand our atmosphere and explore distant planetary atmospheres and interiors.

Low-frequency sound waves below 20 Hz, known as infrasound, are inaudible to the human ear. They can be generated by a variety of natural phenomena, including volcanoes, ocean waves, and earthquakes. These waves travel over large distances and can be recorded by instruments such as microbarometers, which are sensitive to small pressure variations. This data can give unique insight into the source of the infrasound and the properties of the media it traveled through, whether solid, oceanic, or atmospheric. In the future, infrasound data might be key to build more robust weather prediction models and understand the evolution of our solar system.

Infrasound has been used on Earth to monitor stratospheric winds, to analyze the characteristics of man-made explosions, and even to detect earthquakes. But its potential extends beyond our home planet. Infrasound waves generated by meteor impacts on Mars have provided insight into the planet’s shallow seismic velocities, as well as near-surface winds and temperatures. On Venus, recent research considers that balloons floating in its atmosphere, and recording infrasound waves, could be one of the few alternatives to detect “venusquakes” and explore its interior, since surface pressures and temperatures are too extreme for conventional instruments.

Sonification of sound generated by the Flores Sea earthquake as recorded by a balloon flying at 19 km altitude.

Until recently, it has been challenging to map infrasound signals to various planetary phenomena, including ocean waves, atmospheric winds, and planetary interiors. However, our research team and collaborators have made significant strides in this field, developing tools to unlock the potential of infrasound-based planetary research. We retrieve the connections between source and media properties, and sound signatures through 3 different techniques: (1) training neural networks to learn the complex relationships between observed waveforms and source and media characteristics, (2) perform large-scale numerical simulations of seismic and sound waves from earthquakes and explosions, and (3) incorporate knowledge about source and seismic media from adjacent fields such as geodynamics and atmospheric chemistry to inform our modeling work. Our recent work highlights the potential of infrasound-based inversions to predict high-altitude winds from the sound of ocean waves with machine learning, to map an earthquake’s mechanism to its local sound signature, and to assess the detectability of venusquakes from high-altitude balloons.

To ensure the long-term success of infrasound research, dedicated Earth missions will be crucial to collect new data, support the development of efficient global modeling tools, and create rigorous inversion frameworks suited to various planetary environments. Nevertheless, Infrasound research shows that tuning into a planet’s whisper unlocks crucial insights into its state and evolution.

Consumer label for the noise properties of tires and road pavements

Ulf Sandberg – ulf.sandberg@vti.se

Swedish National Road and Transport Research Institute (VTI), Linkoping, -, SE-58195, Sweden

Popular version of 1pNSb9 – Acoustic labelling of tires, road vehicles and road pavements: A vision for substantially improved procedures
Presented at the 185th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0022814

Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.

Not many vehicle owners know that they can contribute to reducing traffic noise by making an informed choice of their tires, while not sacrificing safety or economy. At least you can do so in Europe, where there is a regulation requiring tires be labelled with noise level (among others). But it has substantial flaws for which we propose solutions by applying state-of-the-art and innovative solutions.

It is here where consumer labels come in. In most parts of the world, we have consumer labels including noise levels on household appliances, lawn mowers, printers, etc. But when it comes to vehicles, tires, and road pavements, a noise label on the product is rare. So far, it is mandatory only on tires sold in the European Union, and it took a lot of efforts of noise researchers to get it accepted along with the more “popular” labels for energy (rolling resistance), and wet grip (skid resistance). Figure 1 shows and explains the European label.

Figure 1: The present European tire label, which must be attached to all tires sold in the European Union, here supplemented by explanations.

Why so much focus on tires? Figure 2 illustrates how much of the noise energy that comes from European car tires compared to the “propulsion noise”; i.e. noise from engine, exhaust, transmission, and fans. For speeds above 50 km/h (31 mph) over 80 % of the noise comes from tires. For trucks and busses, the picture is similar although above 50 km/h it may be 50-80 % from the tires. For electric powered vehicles, of course, the tires are almost entirely dominating as a noise source at all speeds. Thus, already now but even more in the future, consumer choices favouring lower noise tires will have an impact on traffic noise exposure. To achieve this progress, tire labels including noise are needed, and they must be fair and discriminate between the quiet and the noisy.

Figure 2: Distribution of tire/road vs propulsion noise. Calculated for typical traffic with 8 % heavy vehicles in Switzerland [Heutschi et al., 2018].

The EU label is a good start, but there are some problems. When we have purchased tires and made noise measurements on them (in A-weighted dB), there is almost no correlation between the noise labels and our measured dB levels. To identify the cause of the problem and suggest improvements, the European Road Administrations (CEDR) funded a project named STEER (Strengthening the Effect of quieter tyres on European Roads), also supplemented by a supporting project by the Swedish Road Administration. STEER found that there were two severe problems in the noise measuring procedure: (1) the test track pavement defined in an ISO standard showed rather large variations from test site to test site, and (2) in many cases only the noisiest tires were measured, and all other tires of the same type (“family”) were labelled with the same value although they could be up to 6 dB quieter. Such “families” may include over 100 different dimensions, as well as load and speed ratings. Consequently, the full potential of the labelling system is far from being used.

The author’s presentation at Acoustics 2023 will deal with the noise labelling problem and suggest in more detail how the measurement procedures may be made much more reproducible and representative. This includes using special reference tires for calibrating test track surfaces, production of such test track surfaces by additive manufacturing (3D printing) from digitally described originals, and calculating the noise levels by digital simulations, modelling, and using AI. Most if not all the noise measurements can go indoors, see an existing facility in Figure 3, to be conducted in laboratories that have large steel drums. Also in such a case a drum surface made by 3D printing is needed.

 

Figure 3: Laboratory drum facility for measurement of both rolling resistance and noise emission of tires (both for cars and trucks). Note the microphones. The tire is loaded and rolled against one of the three surfaces on the drum. Photo from the Gdansk University of Technology, courtesy of Dr P Mioduszewski.