Hear This! Transforming Health Care with Speech-to-Text Technology #ASA187

Hear This! Transforming Health Care with Speech-to-Text Technology #ASA187

Researchers study the importance of enunciation in medical text to speech software

Media Contact:
AIP Media
301-209-3090
media@aip.org

MELVILLE, N.Y., Nov. 21, 2024 – Speech-to-text programs are becoming more popular for everyday tasks like hands-free dictation, helping people who are visually impaired, and transcribing speech for those who are hard of hearing. These tools have many uses, and researcher Bożena Kostek from Gdańsk University of Technology is exploring how STT can be better used in the medical field. By studying how clear speech affects STT accuracy, she hopes to improve its usefulness for health care professionals.

“Automating note-taking for patient data is crucial for doctors and radiologists, as it gives the doctors more face-to-face time with patients and allows for better data collection,” Kostek says.

Enunciation may have a crucial role to play in the accuracy of medial record dictation. This image was created with DALL-E 2. Credit: Bozena Kostek

Kostek also explains the challenges they face in this work.

“STT models often struggle with medical terms, especially in Polish, since many have been trained mainly on English. Also, most resources focus on simple language, not specialized medical vocabulary. Noisy hospital environments make it even harder, as health care providers may not speak clearly due to stress or distractions.”

To tackle these issues, a detailed audio dataset was created with Polish medical terms spoken by doctors and specialists in areas like cardiology and pulmonology. This dataset was analyzed using an Automatic Speech Recognition model, technology that converts speech into text, for transcription. Several metrics, such as Word Error Rate and Character Error Rate, were used to evaluate the quality of the speech recognition. This analysis helps understand how speech clarity and style affect the accuracy of STT.

Kostek will present this data Thursday, Nov. 21, at 3:25 p.m. ET as part of the virtual 187th Meeting of the Acoustical Society of America, running Nov. 18-22, 2024.

“Medical jargon can be tricky, especially with abbreviations that differ across specialties. This is an even more difficult task when we refer to realistic hospital situations in which the room is not acoustically prepared.” Kostek said.

Currently, the focus is on Polish, but there are plans to expand the research to other languages, like Czech. Collaborations are being established with the University Hospital in Brno to develop medical term resources, aiming to enhance the use of STT technology in health care.

“Even though artificial intelligence is helpful in many situations, many problems should be investigated analytically rather than holistically, focusing on breaking a whole picture into individual parts.”

———————– MORE MEETING INFORMATION ———————–
​Main Meeting Website: https://acousticalsociety.org/asa-virtual-fall-2024/
Technical Program: https://eppro01.ativ.me/src/EventPilot/php/express/web/planner.php?id=ASAFALL24

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are summaries (300-500 words) of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the virtual meeting and/or press conferences, contact AIP Media Services at media@aip.org. For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

Enhancing Speech Recognition in Healthcare

Andrzej Czyzewski – andczyz@gmail.com

Gdańsk University of Technology, Faculty of Electronics, Telecommunications and Informatics, Multimedia Systems Department, Gdańsk, Pomerania, 80-233, Poland

Popular version of 1aSP6 – Strategies for Preprocessing Speech to Enhance Neural Model Efficiency in Speech-to-Text Applications
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me/appinfo.php?page=IntHtml&project=ASAFALL24&id=3771522&server=eppro01.ativ.me

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Effective communication in healthcare is essential, as accurate information can directly impact patient care. This paper discusses research aimed at improving speech recognition technology to help medical professionals document patient information more effectively. By using advanced techniques, we can make speech-to-text systems more reliable for healthcare, ensuring they accurately capture spoken information.

In healthcare settings, professionals often need to quickly and accurately record patient interactions. Traditional typing can be slow and error-prone, while speech recognition allows doctors to dictate notes directly into electronic health records (EHRs), saving time and reducing miscommunication.

The main goal of our research was to test various ways of enhancing speech-to-text accuracy in healthcare. We compared several methods to help the system understand spoken language more clearly. These methods included different ways of analyzing sound, like looking at specific sound patterns or filtering background noise.

In this study, we recorded around 80,000 voice samples from medical professionals. These samples were then processed to highlight important speech patterns, making it easier for the system to learn and recognize medical terms. We used a method called Principal Component Analysis (PCA) to keep the data simple while ensuring essential information was retained.

Our findings showed that combining several techniques to capture speech patterns improved system performance. We saw an average accuracy improvement, with fewer word and character recognition errors.

The potential benefits of this work are significant:

  • Smoother documentation: Medical staff can record notes more efficiently, freeing up time for patient care.
  • Improved accuracy: Patient records become more reliable, reducing the chance of miscommunication.
  • Better healthcare outcomes: Enhanced communication can improve the quality of care.

This study highlights the promise of advanced speech recognition in healthcare. With further development, these systems can support medical professionals in delivering better patient care through efficient and accurate documentation.

Figure1. Frontpage of the ADMEDVOICE corpus containing medical text and their spoken equivalents