154th ASA Meeting, New Orleans, LA

[ Lay Language Paper Index | Press Room ]

What is attention, and what is it doing in speech perception?

Alexander L. Francis - francisa@purdue.edu
Department of Speech, Language and Hearing Sciences, Purdue University
West Lafayette, IN 47907

Popular version of paper 2aSCa1
Presented Wednesday morning, November 28, 2007
154th ASA Meeting, New Orleans, LA

In 1890 William James described attention as "...the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. ... It implies withdrawal from some things in order to deal effectively with others..." and this characterization remains remarkably relevant today. In this talk, I argue that selective attention plays an important role in a variety of speech perceptual processes, and, conversely, that research on speech perception can advance our understanding of the mechanisms of attention more generally.

The task of understanding someone talking in a crowded room (dubbed the "cocktail party problem" by Colin Cherry in a 1953 paper in the it Journal of the Acoustical Society of America) exemplifies a situation in which listeners must focus on important perceptual information (the speech of the person talking) and exclude irrelevant information (the speech of everyone else in the room). The better a listener is at filtering out irrelevant information, and/or the better they are at focusing attention on relevant information, the easier the listening task will be. Unfortunately, most modern studies of attention have used visual stimuli and thus may have seemed irrelevant to speech perception researchers. Recently, however, there has been a resurgence of studies investigating attention in hearing, and specifically in speech, as well as a greater interest in applying the insights of attention research to explaining aspects of speech perception.

One area in which reference to attention is becoming common is in explaining how listeners learn to recognize unfamiliar speech sounds. Producing a speech sound (such as the sound [b] as in "bay") generates a complex pattern of acoustic properties (cues), some but not all of which are very useful to listeners for identifying that the speaker did, indeed, say "bay" and not "pay" or "day". Speakers of different languages often rely on different cues, and listeners from one language may have difficulty identifying the specific acoustic cues that are important in another, as in the case of native speakers of Japanese trying to hear the difference between [r] and [l] in English. One way to describe this phenomenon is in terms of attention: Native Japanese speakers learning English must learn to pay attention to the cues that distinguish [r] from [l] in English, and also to ignore cues that, while useful for distinguishing certain Japanese sounds, are actually misleading when listening to English.

Interestingly, since listeners are not consciously aware of the specific acoustic properties to which they are attending, traditional conceptualizations of attention may not be appropriate here. Although the goals and effects are similar to the application of attention as traditionally conceived (focus on the useful information and filter out the irrelevant), we do not yet know whether, or to what degree, the neural mechanisms underlying the two kinds of processes (speech sound learning and selective attention) actually overlap. Research in our lab is currently exploring the difference between learning to attend to new cues and learning to ignore familiar ones, the effect such learning may have on the capacity demands of listening to speech, and the degree to which such changes in the perception of acoustic cues may be related to more commonly studied shifts in selective attention.

While learning studies tend to invoke attentional mechanisms because of their ability to account for the selection one feature over another, other studies have focused on the effect of limited attentional capacity on speech perception. For example, it is commonly accepted that one consequence of normal aging is the gradual loss of cognitive skills, leading to a decline in the ability to understand and remember complex information quickly and accurately. However, recent research suggests that many such age-related impairments may result not so much from a decline in cognitive abilities, but rather from the development of minor perceptual deficits that previously would have been considered trivial. According to these theories, as perceptual acuity declines with age, listeners have to work harder simply to recognize and process the acoustic cues necessary for identifying words and phrases. By devoting more attention to these acoustic properties of speech, older listeners end up with less attention available for conceptual processing such as understanding and remembering the message being spoken. Thus, even words and messages that were initially heard well enough to be understood perfectly end up being harder to remember later. There is also some suggestion that attentional capacity limitations could be related to certain developmental language disabilities such as specific language impairment (SLI) and dyslexia. In this case, researchers have found that children with SLI tend to exhibit poorer than normal attentional abilities, and also tend to exhibit subtle difficulties with auditory processing. One interpretation of these results is that, when a sensory deficit requires the child to devote more than the usual amount of attention to recognizing the sounds of speech, the presence of concomitant attentional limitations may lead to serious difficulties with acquiring normal language abilities even when neither the perceptual nor the attentional deficit might have been detrimental on its own.

Ultimately, the relationship between attention and speech perception is not unidirectional. In this talk I also argue that speech research can contribute uniquely to the study of general properties of attention in important ways. For example, theories of visual attention tend to be divided between those that propose that attention is directed toward objects and those that propose that attention is directed toward regions in space. However, because visual objects typically cannot occupy the same physical space, it is difficult to distinguish between the predictions of object- and location-based theories of attention using visual stimuli. In contrast, two auditory objects can occupy the same spatial location without difficulty. Moreover, speech, by virtue of its ecological importance and relationship with language, provides a rich context in which listeners' attention may be directed, either overtly or covertly. Auditory objects can be defined by content ("listen for words for animals") or speaker ("listen to the male voice") as well as location ("listen to the voice on the left") or even acoustic features ("listen to the high-pitched voice"). In our lab we have recently begun a series of studies looking at how directing attention to spatially defined objects compares to those defined according to other properties such as gender. Such experiments will provide new data relevant to better understanding the cognitive processes underlying speech perception, and will also bear on current debates within the field of attention research more generally.

Work supported by NIH NIDCD grant R03 DC006811.

[ Lay Language Paper Index | Press Room ]