Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Building large AI datasets can help experts provide faster, earlier diagnoses.

Media Contact:
AIP Media
301-209-3090
media@aip.org

PedzSTAR
pedzstarpr@mekkymedia.com

NEW ORLEANS, May 19, 2025 – Speech and language impairments affect over a million children every year, and identifying and treating these conditions early is key to helping these children overcome them. Clinicians struggling with time, resources, and access are in desperate need of tools to make diagnosing speech impairments faster and more accurate.

Marisha Speights, assistant professor at Northwestern University, built a data pipeline to train clinical artificial intelligence tools for childhood speech screening. She will present her work Monday, May 19, at 8:20 a.m. CT as part of the joint 188th Meeting of the Acoustical Society of America and 25th International Congress on Acoustics, running May 18-23.

Children at a childcare center. Credit: GETTY CC BY-SA

AI-based speech recognition and clinical diagnostic tools have been in use for years, but these tools are typically trained and used exclusively on adult speech. That makes them unsuitable for clinical work involving children. New AI tools must be developed, but there are no large datasets of recorded child speech for these tools to be trained on, in part because building these datasets is uniquely challenging.

“There’s a common misconception that collecting speech from children is as straightforward as it is with adults — but in reality, it requires a much more controlled and developmentally sensitive process,” said Speights. “Unlike adult speech, child speech is highly variable, acoustically distinct, and underrepresented in most training corpora.”

To remedy this, Speights and her colleagues began collecting and analyzing large volumes of child speech recordings to build such a dataset. However, they quickly realized a problem: The collection, processing, and annotation of thousands of speech samples is difficult without exactly the kind of automated tools they were trying to build.

“It’s a bit of a catch-22,” said Speights. “We need automated tools to scale data collection, but we need large datasets to train those tools.”

In response, the researchers built a computational pipeline to turn raw speech data into a useful dataset for training AI tools. They collected a representative sample of speech from children across the country, verified transcripts and enhanced audio quality using their custom software, and provided a platform that will enable detailed annotation by experts.

The result is a high-quality dataset that can be used to train clinical AI, giving experts access to a powerful set of tools to make diagnosing speech impairments much easier.

“Speech-language pathologists, health care clinicians and educators will be able to use AI-powered systems to flag speech-language concerns earlier, especially in places where access to specialists is limited,” said Speights.

——————— MORE MEETING INFORMATION ———————
Main Meeting Website: https://acousticalsociety.org/new-orleans-2025/
Technical Program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAICA25

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting and/or press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

ABOUT THE INTERNATIONAL COMMISSION FOR ACOUSTICS
The purpose of the International Commission for Acoustics (ICA) is to promote international development and collaboration in all fields of acoustics including research, development, education, and standardization. ICA’s mission is to be the reference point for the acoustic community, becoming more inclusive and proactive in our global outreach, increasing coordination and support for the growing international interest and activity in acoustics. Learn more at https://www.icacommission.org/.

Bats could help the development of AI robots

Rolf Müller – rolf.mueller@vt.edu
X (twitter): @UBDVTLab
Instagram: @ubdvtcenter
Department of Mechanical Engineering, Virginia Tech, Blacksburg, Virginia, 24061, United States

Popular version of 4aAB7 – Of bats and robots
Presented at the 186th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0027373

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Given the ongoing revolution in AI, it may appear that all humanity can do now is wait for AI-powered robots to take over the world. However, while stringing together eloquently worded sentences is certainly impressive, AI is still far from dealing with many of the complexities of the real world. Besides serving the sinister goal of world-domination, robots that have the intelligence to accomplish demanding missions in complex environments could transform humanity’s ability to deal with fundamental key challenges to its survival, e.g., production of food and regrowable materials as well as maintaining healthy ecosystems.

To accomplish the goal of having a robot operate autonomously in complex real-world environments, a variety of methods have been developed – typically with mixed results at best. At the basis of these methods are usually two related concepts: The creation of a model for the geometry of an environment and the use of deterministic templates to identify objects. However, both approaches have already proven to be limited in their applicability, reliability, as well as due to their often prohibitively high computational cost.

Bats navigating dense vegetation – such as in rainforests of Southeast Asia, where our fieldwork is being carried out – may provide a promising alternative to the current approaches: The animals sense their environments through a small number of brief echoes to ultrasonic pulses. The comparatively large wavelengths of these pulses (millimeter to centimeter) combined with the fact that the ears of the bats fall not too far above from these wavelengths on the size scale condemns bat biosonar to poor angular resolution. This prevents the animals from resolving densely packed scatterers such as leave in a foliage. Hence, the echoes that bats navigating under such conditions have to deal with inputs that can be classified as “clutter”, i.e., signals that consists of contributions from many unresolvable scatterers that must be treated as random due to lack of knowledge. The nature of the clutter echoes makes it unlikely that bats having to deal with complex environments rely heavily on three-dimensional models of their surroundings and deterministic templates.

Hence, bats must have evolved sensing paradigms to ensure that the clutter echoes contain the relevant sensory information and that this information can be extracted. Coupling between sensing and actuation could very well play a critical role in this. Hence, robotics might be of pivotal importance in replicating the skills of bats in sensing and navigating their environments. Similarly, the deep-learning revolution could bring a previously unavailable ability to extract complex patterns from data to bear on the problem of extracting insight from clutter echoes. Taken together, insights from these approaches could lead to novel acoustics-based paradigms for obtaining relevant sensory information on complex environment in a direct and highly parsimonious manner. These approaches could then enable autonomous robots that can learn to navigate new environments in a fast and highly efficient manner and transform the use of autonomous systems in outdoor tasks.

Biomimetic robots designed to reproduce the (a) biosonar sensing and (b) flapping-flight capabilities of bats. Design renderings by Zhengsheng Lu (a) and Adam Carmody (b).

As pilot demonstration for this approach, we present a twin pair of bioinspired robots, one to mimic the biosonar sensing abilities of bats and the other to mimic the flapping flight of the animals. The biosonar robot has been used successfully to identify locations and find passageways in complex, natural environments. To accomplish this, the biomimetic sonar has been integrated with deep-learning analysis of clutter echoes. The flapping-flight line of biomimetic robots has just started to reproduce some of the many degrees of freedom in the wing kinematics of bats. Ultimately, the two robots are to be integrated into a single system to investigate the coupling of biosonar sensing and flight.

Software DJ Creates Automated Pop Song Mashups #Acoustics23

Software DJ Creates Automated Pop Song Mashups #Acoustics23

Automated software mixes drums, vocals to create unique musical combinations.

SYDNEY, Dec. 7, 2023 – Song mashups are a staple of many DJs, who mix the vocals and instrumentals from two or more tracks into a seamless blend, creating a new and exciting final product. While the result is fun to listen to, the creation process can often be challenging, requiring knowledge and expertise to select the right tracks and mash them together perfectly.

Xinyang Wu from the Hong Kong University of Science and Technology took a different approach, designing a computer algorithm to intelligently create mashups using the drum tracks from one song and the vocals and instrumentals from another. He will present his work Dec. 7 at 4:20 p.m. Australian Eastern Daylight Time, as part of Acoustics 2023, running Dec. 4-8 at the International Convention Centre Sydney.

song mashup

The algorithm works to isolate and blend individual components from multiple songs to produce a unique composite with a pleasing sound. Credit: Xinyang Wu

While some algorithms and automated software can attempt to create song mashups, their results are often clunky and unrefined. These methods layer the complete, unaltered tracks on top of each other, aligning them based on detected key moments in the music, rather than skillfully combining the vocals and instrumentals of different songs.

“Imagine trying to make a gourmet meal with only a microwave – that’s sort of what automated mashup software is up against compared to a pro chef, or in this case, a professional music composer,” said Wu. “These pros can get their hands on the original ingredients of a song – the separate vocals, drums, and instruments, known as stems – which lets them mix and match with precision.”

His algorithm takes a different approach, mimicking the process used by professionals. The software works to isolate the stems from each song and identify the most dynamic moments. It adjusts the tempo of the instrumental tracks and adds the drum beat mashup at exactly the right moment for maximum effect.

The result is a unique blend of pleasing lyrics and exciting instrumentals with wide-ranging appeal.

“From what I’ve observed, there’s a clear trend in what listeners prefer in mashups,” said Wu. “Hip-hop drumbeats are the crowd favorite – people seem to really enjoy the groove and rhythm that these beats bring to a mashup.”

Now that the software has been tested on drum tracks, he plans to tackle bass mashups next. For Wu, the dream is to expand the algorithm to incorporate the full instrumental suite and put user-friendly mashup technology directly into the hands of listeners.

“Our ultimate goal is creating an app where users can pick any two songs and choose how to mash them up – whether it’s switching out the drums, bass, instrumentals, or everything together with the other song’s vocals,” said Wu.

###

Contact:
AIP Media
301-209-3090
media@aip.org

———————– MORE MEETING INFORMATION ———————–

The Acoustical Society of America is joining the Australian Acoustical Society to co-host Acoustics 2023 Sydney. This collaborative event will incorporate the Western Pacific Acoustics Conference and the Pacific Rim Underwater Acoustics Conference.

Main meeting website: https://acoustics23sydney.org/
Technical program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAFALL23

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at
https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

ABOUT THE AUSTRALIAN ACOUSTICAL SOCIETY
The Australian Acoustical Society (AAS) is the peak technical society for individuals working in acoustics in Australia. The AAS aims to promote and advance the science and practice of acoustics in all its branches to the wider community and provide support to acousticians. Its diverse membership is made up from academia, consultancies, industry, equipment manufacturers and retailers, and all levels of Government. The Society supports research and provides regular forums for those who practice or study acoustics across a wide range of fields The principal activities of the Society are technical meetings held by each State Division, annual conferences which are held by the State Divisions and the ASNZ in rotation, and publication of the journal Acoustics Australia. https://www.acoustics.org.au/

Hey Siri, Can You Hear Me? #ASA184

Hey Siri, Can You Hear Me? #ASA184

Experiments show how speech and comprehension change when people communicate with artificial intelligence.

Media Contact:
Ashley Piccone
AIP Media
301-209-3090
media@aip.org

CHICAGO, May 9, 2023 – Millions of people now regularly communicate with AI-based devices, such as smartphones, speakers, and cars. Studying these interactions can improve AI’s ability to understand human speech and determine how talking with technology impacts language.

In their talk, “Clear speech in the new digital era: Speaking and listening clearly to voice-AI systems,” Georgia Zellou and Michelle Cohn of the University of California, Davis will describe experiments to investigate how speech and comprehension change when humans communicate with AI. The presentation will take place Tuesday, May 9, at 12:40 p.m. Eastern U.S. in the Los Angeles/Miami/Scottsdale room, as part of the 184th Meeting of the Acoustical Society of America running May 8-12 at the Chicago Marriott Downtown Magnificent Mile Hotel.

Humans change their voice when communicating with AI. Credit: Michelle Cohn

In their first line of questioning, Zellou and Cohn examined how people adjust their voice when communicating with an AI system compared to talking with another human. They found the participants produced louder and slower speech with less pitch variation when they spoke to voice-AI (e.g., Siri, Alexa), even across identical interactions.

On the listening side, the researchers showed that how humanlike a device sounds impacts how well listeners will understand it. If a listener thinks the voice talking is a device, they are less able to accurately understand. However, if it sounds more humanlike, their comprehension increases. Clear speech, like in the style of a newscaster, was better understood overall, even if it was machine-generated.

“We do see some differences in patterns across human- and machine-directed speech: People are louder and slower when talking to technology. These adjustments are similar to the changes speakers make when talking in background noise, such as in a crowded restaurant,” said Zellou. “People also have expectations that the systems will misunderstand them and that they won’t be able to understand the output.”

Clarifying what makes a speaker intelligible will be useful for voice technology. For example, these results suggest that text-to-speech voices should adopt a “clear” style in noisy conditions.

Looking forward, the team aims to apply these studies to people from different age groups and social and language backgrounds. They also want to investigate how people learn language from devices and how linguistic behavior adapts as technology changes.

“There are so many open questions,” said Cohn. “For example, could voice-AI be a source of language change among some speakers? As technology advances, such as with large language models like ChatGPT, the boundary between human and machine is changing – how will our language change with it?”

———————– MORE MEETING INFORMATION ———————–
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eppro02.ativ.me/web/planner.php?id=ASASPRING23&proof=true

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org.  For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

Artificial intelligence in music production: controversy & opportunity

Joshua Reiss Reiss – joshua.reiss@qmul.ac.uk
Twitter: @IntelSoundEng

Queen Mary University of London, Mile End Road, London, England, E1 4NS, United Kingdom

Popular version of 3aSP1-Artificial intelligence in music production: controversy and opportunity, presented at the 183rd ASA Meeting.

Music production
In music production, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. The mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is labour intensive and requires a professional engineer. Modern production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

Intelligent music production
Intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

Artificial Intelligence Figure 1: The architecture of an automatic mixing system. [Image courtesy of the author] Figure 1 Caption: The architecture of an automatic mixing system. [Image courtesy of the author]

Perception of mixing
But there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Best practices
And we don’t know why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we needed to establish what is good production, how to translate it into rules and exploit it within algorithms. We needed to step back and explore more fundamental questions, filling gaps in our understanding of production and perception.

Knowledge engineering
We used an approach that incorporated one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So let’s capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

We systematically tested all the assumptions about best practices and supplemented them with listening tests that helped us understand how people perceive complex sound mixtures. We also curated multitrack audio, with detailed information about how it was recorded, multiple mixes and evaluations of those mixes.

This enabled us to develop intelligent systems that automate much of the music production process.

Video Caption: An automatic mixing system based on a technology we developed.

Transformational impact
I gave a talk about this once in a room that had panel windows all around. These talks are usually half full. But this time it was packed, and I could see faces outside pressed up against the windows. They all wanted to find out about this idea of automatic mixing. It’s  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent systems open up new opportunities. Amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Taking away creativity
Its controversial. We entered an automatic mix in a student recording competition as a sort of Turing Test. Technically we cheated, because the mixes were supposed to be made by students, not by an ‘artificial intelligence’ (AI) created by a student. Afterwards I asked the judges what they thought of the mix. The first two were surprised and curious when I told them how it was done. The third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

Not about replacing sound engineers
These are valid concerns, but its important to see other perspectives. A tremendous amount of music production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said “All artists are willing to suffer for their work. But why are so few prepared to learn to draw?”

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

Video Caption: Time offset (comb filtering) correction, a technical problem in music production solved by an intelligent system.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small restaurant venue that does not provide any support, or game audio, where dozens of sounds need to be mixed and there is no miniature sound engineer living inside the games console.