Can Artificial Intelligence Accurately Clone Dysphonic Voices?
Pasquale Bottalico – pb81@illinois.edu
University of Illinois at Urbana-Champaign
Champaign, Illinois, 61801
United States
Additional Authors
Charles J. Nudelman
Daniel Fogerty
Virginia Tardini
Keiko Ishikawa
Popular version of 2aSCa8 – Can Artificial Inteligence Accurately Clone Dysphonic Voices? A Perceptual and Intelligibility Assessment
Presented at the 189th ASA Meeting
Read the abstract at https://eppro02.ativ.me//web/index.php?page=Session&project=ASAASJ25&id=3981555
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Artificial intelligence is now remarkably good at cloning human voices, but can it convincingly imitate a disordered voice? Our findings suggest that while AI excels at copying healthy speech, it still struggles to capture the acoustic complexity of dysphonia, a condition that makes the voice sound rough, strained, or breathy.
Dysphonia affects millions of people and often reduces speech intelligibility, especially in noisy environments. Because collecting large amounts of patient data can be difficult, researchers wondered whether AI voice-cloning technologies might one day help them simulate disordered speech for training, education, or early-stage clinical research.
To test this idea, the team recorded 12 speakers (six with healthy voices and six with dysphonia) and used a commercial AI system to create a digital “voice clone” of each person. These AI voices were trained using about one minute of recorded speech for each speaker. More than 60 listeners participated in three online experiments designed to evaluate whether the AI-generated voice clones truly preserved the qualities of disordered speech.
Watch the short video below to see exactly how the experiment worked.
In the listening tasks, participants heard pairs of sentences. Sometimes both sentences were from the real speaker, sometimes both were AI-generated, and sometimes one was real and one was AI. In some trials, listeners tried to decide whether the two voices came from the same person. In others, they had to identify which sentence (if any) was produced by AI. A third task tested how well listeners understood real and AI-generated dysphonic speech in background noise.
In the first experiment, as shown in Figure 1, listeners were very accurate when both samples were real. Here, accuracy refers to the proportion of trials in which listeners correctly judged whether the two voice samples were from the same or different speakers. Accuracy dropped slightly when both samples were AI-generated. But when one sample was real and the other AI-generated, performance fell sharply, especially for healthy voices, where the AI clones often sounded strikingly similar to the real person.

Figure 1. Bar plot showing the percentage of correct AI identification responses across conditions for normal and dysphonic voices. Bars represent mean percentages with 95% confidence intervals. Note: RL = real speech; AI = AI-generated speech.

Figure 2. Bar plot showing the percentage of correct AI identification responses across conditions for normal and dysphonic voices. Bars represent mean percentages with 95% confidence intervals. Note: RL = real speech; AI = AI-generated speech.
These results demonstrate that while AI voice cloning is impressively realistic for healthy speech, it does not yet capture the natural irregularities of disordered voices. For now, real patient recordings remain essential. However, this research highlights the exciting potential of improved AI tools in the future.

Figure 3. Mean intelligibility scores (IS) of normal and dysphonic groups in real and AI-generated voice conditions. The IS values vary from 0 to 1. Error bars indicate standard errors. Note: RL = real speech; AI = AI-generated speech.