ASA PRESSROOM


Acoustical Society of America
159th Meeting Lay Language Papers


[ Lay Language Paper Index | Press Room ]



How the Brain Solves the Problem of Recognizing Speech in the Presence

of Other Sources of Sounds and Distortions?

 

 

Nima Mesgarani - mnima@umd.edu

Johns Hopkins University

Baltimore, MD

 

Stephen David

Jonathan Fritz

Shihab Shamma

University of Maryland

College Park, MD

 

Popular version of paper 4pPP32

Presented Thursday afternoon, April 22, 2010

159th ASA Meeting, Baltimore, MD

 

 

Humans are so extremely good in recognizing speech in noisy and reverberant environment like a noisy cocktail party that we sometimes forget how complex this task is. The auditory system is faced with an extraordinary challenge how to separate and decode information from the blooming, buzzing overlapping jumble of sound sources that we typically encounter in acoustic scenes. The problem of how to analyze the acoustic scene from the sound pressure waveform that reaches the eardrum, and extract a single sound is similar to recognizing and following the movement of a single boat from the waves it creates at the shore on a windy day and in the presence of many other boats. We know this is a hard problem when the best computer algorithms that have been developed to mimic this human ability are still an order of magnitude away from human performance.

 

In this study, we investigate this amazing feature of the auditory system directly using a powerful inverse reconstruction technique that we have recently developed. In this method, we recorded the activity of few hundred neurons in the primary auditory cortex of the ferret brain (the first cortical way station for auditory processing) when the animal was listening to speech sounds. We then formulated a projection that can reconstruct the sound from the neural responses in the brain. Comparing the reconstructed speech from the brain and the original, we can then ask questions about the aspects of the speech that are present in the primary auditory cortex, and therefore infer the neural computations of the auditory pathway.

 

Having this method, we then played speech sounds that were mixed with noise and echo reverberation while recording the activity of the cortical neurons. Surprisingly, even in the presence of noise the brain representation of speech was not impaired. We found that the

reconstructed sound from the responses of these neurons to distorted speech was largely cleaned by the brain, as if the brain performs a restoration and enhancement of the speech. The early auditory pathway therefore provides a cleaned or enhanced, noise‑free representation to the higher‑level cognitive areas that contains only the relevant information, free from distortion.

 

For this knowledge to benefit next generation engineering applications, we first need to develop computational models that explain this noise robustness phenomenon. Here, we start with an existing widely used neural model and demonstrate how it is unable to explain this observation. However, with a simple modification, this model can then account for the observed effect, thereby providing a framework for developing improved computer, signal processing and speech recognition algorithms.