Understanding Why Engine Noise Feels Loud in Hybrid Vehicles with AI

Shinichi Suganuma – shinichi_suganuma@camal.mech.chuo-u.ac.jp

Graduate School of Science and Engineering
Chuo University
1-13-27 Kasuga
Bunkyo-ku, Tokyo, 112-8551
Japan

Shimpei Nagae
Nissan Motor Co., Ltd.
Kanagawa, Japan

Takeshi Toi
Chuo University
Tokyo, Japan

Popular version of 4aNSa2 – Development of a Machine Learning Model to Predict Engine Noise Perception Considering Regional and Driving Environment Differences
Presented at the 189th ASA Meeting
Read the abstract at https://eppro02.ativ.me//web/index.php?page=Session&project=ASAASJ25&id=3980090

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

When driving a hybrid vehicle, many people notice the moment when the quiet electric drive suddenly switches to the engine — and the engine can feel “loud,” even when the actual sound level is modest. Why does this happen? And does the way drivers perceive this noise differ across countries? In this study, we used machine learning to predict how people judge engine noise annoyance and to uncover insights that may help make future hybrid vehicles more comfortable.

Figure 1. AI Model for Predicting Engine Noise Perception
Video 1. On-Road Driving Example for Data Collection

We conducted on-road evaluations in Japan, the United States, and the United Kingdom. During each test, we simultaneously recorded in-cabin sound, vehicle parameters, and drivers’ ratings of engine noise on a three-level scale (“Not noisy,” “Noisy,” “Very noisy”), creating a dataset for AI training. In Japan, we used the series-hybrid Nissan Note e-POWER. In the U.S., where this model is not sold, we reproduced its engine sound on the Nissan Ariya EV, and in the U.K. we used the Qashqai e-POWER engine sound played on the Ariya. Because vehicles, drivers, and road environments differed across regions, the study provided a stringent test of model generality.

Figure 2. On-Road Evaluation Conditions in Japan, the U.S., and the U.K.

First, we tested how accurately AI could predict annoyance using only in-cabin sound data such as loudness and sharpness etc. In Japan, prediction accuracy reached about 57%. When we added three vehicle parameters — engine speed, acceleration torque, and vehicle speed — accuracy increased to 67%, demonstrating that driving conditions, not just sound, play an important role in annoyance perception. The same trend was observed in the U.S. and the U.K.

Figure 3. Prediction Accuracy Improvements Using Vehicle Data and Time History

However, the relative importance of the three vehicle parameters differed by region. In Japan and the U.S., engine speed contributed most strongly to predictions. In contrast, in the U.K., acceleration torque was the most influential factor. This likely reflects the presence of many roundabouts in the U.K. test route, where frequent acceleration and deceleration lead drivers to value the coherence between engine sound and vehicle motion. This aligns with the author’s own experience living in the U.K. for three years.

Next, we incorporated several seconds of engine-speed history into the vehicle parameters. In all regions, adding this short-term history improved prediction accuracy. Although the optimal history length differed slightly — around 5.5 seconds in Japan and 6.5 seconds in the U.S., — the common finding was clear: people judge engine noise not from a single moment but from the pattern of change over several seconds.

Figure 4 Prediction Improvement When Engine-Speed History Is Added

Despite differences in vehicles, traffic environments, and evaluation routes, considering “vehicle operating conditions” together with “recent temporal changes” consistently improved the AI’s ability to predict annoyance across all regions. These findings provide valuable clues for designing hybrid vehicles that feel smoother and more comfortable for drivers around the world.

Can Artificial Intelligence Accurately Clone Dysphonic Voices?

Pasquale Bottalico – pb81@illinois.edu

University of Illinois at Urbana-Champaign
Champaign, Illinois, 61801
United States

Additional Authors
Charles J. Nudelman
Daniel Fogerty
Virginia Tardini
Keiko Ishikawa

Popular version of 2aSCa8 – Can Artificial Inteligence Accurately Clone Dysphonic Voices? A Perceptual and Intelligibility Assessment
Presented at the 189th ASA Meeting
Read the abstract at https://eppro02.ativ.me//web/index.php?page=Session&project=ASAASJ25&id=3981555

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Artificial intelligence is now remarkably good at cloning human voices, but can it convincingly imitate a disordered voice? Our findings suggest that while AI excels at copying healthy speech, it still struggles to capture the acoustic complexity of dysphonia, a condition that makes the voice sound rough, strained, or breathy.

Dysphonia affects millions of people and often reduces speech intelligibility, especially in noisy environments. Because collecting large amounts of patient data can be difficult, researchers wondered whether AI voice-cloning technologies might one day help them simulate disordered speech for training, education, or early-stage clinical research.

To test this idea, the team recorded 12 speakers (six with healthy voices and six with dysphonia)  and used a commercial AI system to create a digital “voice clone” of each person. These AI voices were trained using about one minute of recorded speech for each speaker. More than 60 listeners participated in three online experiments designed to evaluate whether the AI-generated voice clones truly preserved the qualities of disordered speech.

Watch the short video below to see exactly how the experiment worked.

In the listening tasks, participants heard pairs of sentences. Sometimes both sentences were from the real speaker, sometimes both were AI-generated, and sometimes one was real and one was AI. In some trials, listeners tried to decide whether the two voices came from the same person. In others, they had to identify which sentence (if any) was produced by AI. A third task tested how well listeners understood real and AI-generated dysphonic speech in background noise.

In the first experiment, as shown in Figure 1, listeners were very accurate when both samples were real. Here, accuracy refers to the proportion of trials in which listeners correctly judged whether the two voice samples were from the same or different speakers. Accuracy dropped slightly when both samples were AI-generated. But when one sample was real and the other AI-generated, performance fell sharply, especially for healthy voices, where the AI clones often sounded strikingly similar to the real person.

Figure 1. Bar plot showing the percentage of correct AI identification responses across conditions for normal and dysphonic voices. Bars represent mean percentages with 95% confidence intervals. Note: RL = real speech; AI = AI-generated speech.

Figure 2. Bar plot showing the percentage of correct AI identification responses across conditions for normal and dysphonic voices. Bars represent mean percentages with 95% confidence intervals. Note: RL = real speech; AI = AI-generated speech.

A second experiment asked listeners to identify which sentences were AI-generated. For healthy voices, AI was difficult to detect. For dysphonic voices, however, listeners were more successful — suggesting the AI system smoothed out or failed to reproduce key features of dysphonia. The results are shown in Figure 2.
The final experiment delivered the strongest finding: AI-generated dysphonic voices were significantly more intelligible than real dysphonic voices when played in background noise. In other words, the AI unintentionally “cleaned up” the voice disorder, creating speech that sounded clearer and easier to understand than the real dysphonic voices. The results are shown in Figure 3.

These results demonstrate that while AI voice cloning is impressively realistic for healthy speech, it does not yet capture the natural irregularities of disordered voices. For now, real patient recordings remain essential. However, this research highlights the exciting potential of improved AI tools in the future.

Figure 3. Mean intelligibility scores (IS) of normal and dysphonic groups in real and AI-generated voice conditions. The IS values vary from 0 to 1. Error bars indicate standard errors. Note: RL = real speech; AI = AI-generated speech.

Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Laying the Groundwork to Diagnose Speech Impairments in Children with Clinical AI #ASA188

Building large AI datasets can help experts provide faster, earlier diagnoses.

Media Contact:
AIP Media
301-209-3090
media@aip.org

PedzSTAR
pedzstarpr@mekkymedia.com

NEW ORLEANS, May 19, 2025 – Speech and language impairments affect over a million children every year, and identifying and treating these conditions early is key to helping these children overcome them. Clinicians struggling with time, resources, and access are in desperate need of tools to make diagnosing speech impairments faster and more accurate.

Marisha Speights, assistant professor at Northwestern University, built a data pipeline to train clinical artificial intelligence tools for childhood speech screening. She will present her work Monday, May 19, at 8:20 a.m. CT as part of the joint 188th Meeting of the Acoustical Society of America and 25th International Congress on Acoustics, running May 18-23.

Children at a childcare center. Credit: GETTY CC BY-SA

AI-based speech recognition and clinical diagnostic tools have been in use for years, but these tools are typically trained and used exclusively on adult speech. That makes them unsuitable for clinical work involving children. New AI tools must be developed, but there are no large datasets of recorded child speech for these tools to be trained on, in part because building these datasets is uniquely challenging.

“There’s a common misconception that collecting speech from children is as straightforward as it is with adults — but in reality, it requires a much more controlled and developmentally sensitive process,” said Speights. “Unlike adult speech, child speech is highly variable, acoustically distinct, and underrepresented in most training corpora.”

To remedy this, Speights and her colleagues began collecting and analyzing large volumes of child speech recordings to build such a dataset. However, they quickly realized a problem: The collection, processing, and annotation of thousands of speech samples is difficult without exactly the kind of automated tools they were trying to build.

“It’s a bit of a catch-22,” said Speights. “We need automated tools to scale data collection, but we need large datasets to train those tools.”

In response, the researchers built a computational pipeline to turn raw speech data into a useful dataset for training AI tools. They collected a representative sample of speech from children across the country, verified transcripts and enhanced audio quality using their custom software, and provided a platform that will enable detailed annotation by experts.

The result is a high-quality dataset that can be used to train clinical AI, giving experts access to a powerful set of tools to make diagnosing speech impairments much easier.

“Speech-language pathologists, health care clinicians and educators will be able to use AI-powered systems to flag speech-language concerns earlier, especially in places where access to specialists is limited,” said Speights.

——————— MORE MEETING INFORMATION ———————
Main Meeting Website: https://acousticalsociety.org/new-orleans-2025/
Technical Program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAICA25

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting and/or press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

ABOUT THE INTERNATIONAL COMMISSION FOR ACOUSTICS
The purpose of the International Commission for Acoustics (ICA) is to promote international development and collaboration in all fields of acoustics including research, development, education, and standardization. ICA’s mission is to be the reference point for the acoustic community, becoming more inclusive and proactive in our global outreach, increasing coordination and support for the growing international interest and activity in acoustics. Learn more at https://www.icacommission.org/.

Finding the Right Tools to Interpret Crowd Noise at Sporting Events with AI

Jason Bickmore – jbickmore17@gmail.com

Instagram: @jason.bickmore
Brigham Young University, Department of Physics and Astronomy, Provo, Utah, 84602, United States

Popular version of 1aCA4 – Feature selection for machine-learned crowd reactions at collegiate basketball games
Presented at the 188th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0037270

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

A mixture of traditional and custom tools is enabling AI to make meaning in an unexplored frontier: crowd noise at sporting events.

The unique link between a crowd’s emotional state and its sound makes crowd noise a promising way to capture feedback about an event continuously and in real-time. Transformed into feedback, crowd noise would help venues improve the experience for fans, sharpen advertisements, and support safety.

To capture this feedback, we turned to machine learning, a popular strategy for making tricky connections. While the tools required to teach AI to interpret speech from a single person are well-understood (think Siri), the tools required to make sense of crowd noise are not.

To find the best tools for this job, we began with a simpler task: teaching an AI model to recognize applause, chanting, distracting the other team, and cheering at college basketball and volleyball games (Fig. 1).

Figure 1: Machine learning identifies crowd behaviors from crowd noise. We helped machine learning models recognize four behaviors: applauding, chanting, cheering, and distracting the other team. Image courtesy of byucougars.com.

We began with a large list of tools, called features, some drawn from traditional speech processing and others created using a custom strategy. After applying five methods to eliminate all but the most powerful features, a blend of traditional and custom features remained. A model trained with these features recognized the four behaviors with at least 70% accuracy.

Based on these results, we concluded that, when interpreting crowd noise, both traditional and custom features have a place. Even though crowd noise is not the situation the traditional tools were designed for, they are still valuable. The custom tools are useful too, complementing the traditional tools and sometimes outperforming them. The tools’ success at recognizing the four behaviors indicates that a similar blend of traditional and custom tools could enable AI models to navigate crowd noise well enough to translate it into real-time feedback. In future work, we will investigate the robustness of these features by checking whether they enable AI to recognize crowd behaviors equally well at events other than college basketball and volleyball games.

Software DJ Creates Automated Pop Song Mashups #Acoustics23

Software DJ Creates Automated Pop Song Mashups #Acoustics23

Automated software mixes drums, vocals to create unique musical combinations.

SYDNEY, Dec. 7, 2023 – Song mashups are a staple of many DJs, who mix the vocals and instrumentals from two or more tracks into a seamless blend, creating a new and exciting final product. While the result is fun to listen to, the creation process can often be challenging, requiring knowledge and expertise to select the right tracks and mash them together perfectly.

Xinyang Wu from the Hong Kong University of Science and Technology took a different approach, designing a computer algorithm to intelligently create mashups using the drum tracks from one song and the vocals and instrumentals from another. He will present his work Dec. 7 at 4:20 p.m. Australian Eastern Daylight Time, as part of Acoustics 2023, running Dec. 4-8 at the International Convention Centre Sydney.

song mashup

The algorithm works to isolate and blend individual components from multiple songs to produce a unique composite with a pleasing sound. Credit: Xinyang Wu

While some algorithms and automated software can attempt to create song mashups, their results are often clunky and unrefined. These methods layer the complete, unaltered tracks on top of each other, aligning them based on detected key moments in the music, rather than skillfully combining the vocals and instrumentals of different songs.

“Imagine trying to make a gourmet meal with only a microwave – that’s sort of what automated mashup software is up against compared to a pro chef, or in this case, a professional music composer,” said Wu. “These pros can get their hands on the original ingredients of a song – the separate vocals, drums, and instruments, known as stems – which lets them mix and match with precision.”

His algorithm takes a different approach, mimicking the process used by professionals. The software works to isolate the stems from each song and identify the most dynamic moments. It adjusts the tempo of the instrumental tracks and adds the drum beat mashup at exactly the right moment for maximum effect.

The result is a unique blend of pleasing lyrics and exciting instrumentals with wide-ranging appeal.

“From what I’ve observed, there’s a clear trend in what listeners prefer in mashups,” said Wu. “Hip-hop drumbeats are the crowd favorite – people seem to really enjoy the groove and rhythm that these beats bring to a mashup.”

Now that the software has been tested on drum tracks, he plans to tackle bass mashups next. For Wu, the dream is to expand the algorithm to incorporate the full instrumental suite and put user-friendly mashup technology directly into the hands of listeners.

“Our ultimate goal is creating an app where users can pick any two songs and choose how to mash them up – whether it’s switching out the drums, bass, instrumentals, or everything together with the other song’s vocals,” said Wu.

###

Contact:
AIP Media
301-209-3090
media@aip.org

———————– MORE MEETING INFORMATION ———————–

The Acoustical Society of America is joining the Australian Acoustical Society to co-host Acoustics 2023 Sydney. This collaborative event will incorporate the Western Pacific Acoustics Conference and the Pacific Rim Underwater Acoustics Conference.

Main meeting website: https://acoustics23sydney.org/
Technical program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAFALL23

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at
https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

ABOUT THE AUSTRALIAN ACOUSTICAL SOCIETY
The Australian Acoustical Society (AAS) is the peak technical society for individuals working in acoustics in Australia. The AAS aims to promote and advance the science and practice of acoustics in all its branches to the wider community and provide support to acousticians. Its diverse membership is made up from academia, consultancies, industry, equipment manufacturers and retailers, and all levels of Government. The Society supports research and provides regular forums for those who practice or study acoustics across a wide range of fields The principal activities of the Society are technical meetings held by each State Division, annual conferences which are held by the State Divisions and the ASNZ in rotation, and publication of the journal Acoustics Australia. https://www.acoustics.org.au/