How the Brain Solves the Problem of Recognizing
Speech in the Presence
of Other Sources of Sounds and Distortions?
Nima Mesgarani - mnima@umd.edu
Johns Hopkins University
Baltimore,
MD
Stephen
David
Jonathan
Fritz
Shihab Shamma
University
of Maryland
College
Park, MD
Popular
version of paper 4pPP32
Presented
Thursday afternoon, April 22, 2010
159th
ASA Meeting, Baltimore, MD
Humans
are so extremely good in recognizing speech in noisy and reverberant
environment like a noisy cocktail party that we sometimes forget how complex
this task is. The auditory system is faced with an extraordinary challenge
how to separate and decode information from the blooming, buzzing overlapping
jumble of sound sources that we typically encounter in acoustic scenes. The
problem of how to analyze the acoustic scene from the sound pressure waveform
that reaches the eardrum, and extract a single sound is similar to recognizing
and following the movement of a single boat from the waves it creates at the
shore on a windy day and in the presence of many other boats. We know this is a
hard problem when the best computer algorithms that have been developed to
mimic this human ability are still an order of magnitude away from human
performance.
In
this study, we investigate this amazing feature of the auditory system directly
using a powerful inverse reconstruction technique that we have recently
developed. In this method, we recorded the activity of few hundred neurons in
the primary auditory cortex of the ferret brain (the first cortical way station
for auditory processing) when the animal was listening to speech sounds. We
then formulated a projection that can reconstruct the sound from the neural
responses in the brain. Comparing the reconstructed speech from the brain and
the original, we can then ask questions about the aspects of the speech that
are present in the primary auditory cortex, and therefore infer the neural
computations of the auditory pathway.
Having
this method, we then played speech sounds that were mixed with noise and echo
reverberation while recording the activity of the cortical neurons.
Surprisingly, even in the presence of noise the brain representation of speech
was not impaired. We found that the
reconstructed
sound from the responses of these neurons to distorted speech was largely
cleaned by the brain, as if the brain performs a restoration and enhancement of
the speech. The early auditory pathway therefore provides a cleaned or
enhanced, noise‑free representation to the higher‑level cognitive areas that contains only the
relevant information, free from distortion.
For
this knowledge to benefit next generation engineering applications, we first
need to develop computational models that explain this noise robustness
phenomenon. Here, we start with an existing widely used neural model and
demonstrate how it is unable to explain this observation. However, with a
simple modification, this model can then account for the observed effect, thereby
providing a framework for developing improved computer, signal processing and speech
recognition algorithms.