Speech-in-noise recognition as both an experience- and signal-dependent process

Ann Bradlow – abradlow@northwestern.edu
Department of Linguistics
Northwestern UniversitY
2016 Sheridan Road
Evanston, IL 60208
Popular version of paper 4aAAa1

Presented Thursday morning, October 30, 2014
168th ASA Meeting, Indianapolis

Real-world speech understanding in naturally “crowded” auditory soundscapes is a complex operation that acts upon an integrated speech-plus-noise signal. Does all of the auditory “clutter” that surrounds speech make its way into our heads along with the speech? Or, do we perceptually isolate and discard background noise at an early stage of processing based on general acoustic properties that differentiate sounds from non-speech noise sources and those from human vocal tracts (i.e. speech)?

We addressed these questions by first examining the ability to tune into speech while simultaneously tuning out noise. Is this ability influenced by properties of the listener (their experience-dependent knowledge) as well as by properties of the signal (factors that make it more or less difficult to separate a given target from a given masker)? Listeners were presented with English sentences in a background of competing speech that was either English (matched-language, English-in-English recognition) or another language (mismatched-language, e.g. English-in-Mandarin recognition). Listeners were either native or non-native listeners of English and were either familiar or unfamiliar with the language of the to-be-ignored, background speech (English, Mandarin, Dutch, or Croatian). Overall, we found that matched-language speech-in-speech understanding (English-in-English) is significantly harder than mismatched-language speech-in-speech understanding (e.g. English-in-Mandarin). Importantly, listener familiarity with the background language modulated the magnitude of the mismatched-language benefit On a smaller time scale of experience, we also find that this benefit is modulated by short-term adaptation to a consistent background language within a test session. Thus, we conclude that speech understanding in conditions that involve competing background speech engages experience-dependent knowledge in addition to signal-dependent processes of auditory stream segregation.

Experiment Series 2 then asked if listeners’ memory traces for spoken words with concurrent background noise remain associated in memory with the background noise. Listeners were presented with a list of spoken words and for each word they were asked to indicate if the word was “old” (i.e. had occurred previously in the test session) or “new” (i.e. had not been presented over the course of the experiment). All words were presented with concurrent noise that was either aperiodic in a limited frequency band (i.e. like wind in the trees) or a pure tone. Importantly, both types of noise were clearly from a sound source that was very different from the speech source. In general, words were more likely to be correctly recognized as previously-heard if the noise on the second presentation matched the noise on the first presentation (e.g. pure tone on both first and second presentations of the word). This suggests that the memory trace for spoken words that have been presented in noisy backgrounds includes an association with the specific concurrent noise. That is, even sounds that quite clearly emanate from an entirely different source remain integrated with the cognitive representation of speech rather than being permanently discarded during speech processing.

These findings suggest that real-world speech understanding in naturally “crowded” auditory soundscapes involves an integrated speech-plus-noise signal at various stages of processing and representation. All of the auditory “clutter” that surrounds speech somehow makes its way into our heads along with the speech leaving us with exquisitely detailed auditory memories from which we build rich representations of our unique experiences.

Important note: The work in this presentation was conducted in a highly collaborative laboratory at Northwestern University. Critical contributors to this work are former group members Susanne Brouwer (now at Utrecht University, Netherlands), Lauren Calandruccio (now at UNC-Chapel Hill), and Kristin Van Engen (now at Washington University, St. Louis), and current group member, Angela Cooper.

4aAAa1 – Speech-in-noise recognition as both an experience- and signal-dependent process

Like this:

Keep reading!

4aAAa1 – Speech-in-noise recognition as both an experience- and signal-dependent process

Share this:

Like this:

Keep reading!

Search for papers by Acoustics Keyword