ASA Lay Language Papers
163rd Acoustical Society of America Meeting


Making Sense of Sounds

Susan L. Denham -- sdenham@plymouth.ac.uk
School of Psychology
University of Plymouth, UK

Istvan Winkler -- iwinkler@cogpsyphy.hu
Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences
Hungarian Academy of Sciences, Hungary
Institute of Psychology, University of Szeged, Hungary

Robert W. Mill -- robert.mill.uop@googlemail.com
MRC Institute of Hearing Research
Nottingham, UK

Tamas M. Bohm -- bohm@cogpsyphy.hu
Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences
Hungarian Academy of Sciences, Hungary
Department of Telecommunications and Media Informatics
Budapest University of Technology and Economics, Hungary

Alexandra Bendixen -- alexandra.bendixen@uni-leipzig.de
Institute for Psychology
University of Leipzig, Germany

Popular Version of Paper 1pPP1
Presented Monday morning, May 14, 2012
163rd ASA Meeting, Hong Kong

Tiny vibrations of the air molecules around us convey information about remote objects and their behavior. To decode this useful information we have evolved specialized sensors and processes: the ears and the auditory system. However, what our ears receive is actually a mixture of the signals generated by whatever sound emitting sources happen to be present, which can of course change from one moment to the next. So, how do we make sense of the mixture and form mental images of the sound sources around us? For each sound event, a footstep for example, how do we decide whether it was caused by one of the sources we know about, and if so, which one? Conversely, how can we tell whether an event was caused by a new source within the scene, or whether or it originated from a known source, whose behaviour has changed? In sum, how do we achieve the remarkable feat of making timely yet accurate perceptual decisions within the ever-changing acoustic environment?

We propose that the auditory system constantly seeks out patterns in the incoming signals. These patterns range over many different time scales, from the very rapid vibrations perceived as pitch, to the sequences of sound events, perceived for example as a melody. Once a pattern has been detected, we know which parts of the signal contributed to it and therefore belong together. Even more importantly, we can develop expectations of what should come next and when. We can then use mismatches between these expectations and what actually happens to refine our mental representation of the pattern, and in this way improve our understanding of the world. There is growing experimental evidence for regarding perception as a predictive process. Even sleeping newborn babies are now known to extract patterns from sound sequences, such as rhythmic patterns, and to detect when their expectations are violated (indicated by altered brain signals), e.g., a missing beat in a rhythmic sequence.

We have been studying the problem of auditory scene analysis, outlined above, using auditory streaming experiments. In these experiments we use long sequences of tones arranged in a repeating ABA–ABA–... pattern, where the A’s and B’s are tones of different pitch and “–” indicates a silent interval equal in duration to the tones. Even this very simple sequence has more than one interpretation. We can hear all tones as coming from the same source with a kind of galloping rhythm (...xxx xxx xxx...) or from two sources, a fast one (...A A A...) and a slow one (...B   B   B...), both with even rhythms. Even more interestingly, when people listen to such a sequence for a long time, their perception changes between these alternative interpretations, even though the sequence remains the same. This phenomenon is known as perceptual bistability, and resembles well-known examples in vision, such as Rubin’s vase-face illusion, in which perception switches back and forth between two faces and a single vase. An important aspect of perceptual bistability is that perceptual switching occurs relatively uncontrollably; people cannot prevent perceptual switches even if they try, and the duration of each perception varies quite randomly, although it can be influenced by a number of stimulus, task and individual factors.

Perceptual bistability has turned out to be a very useful way to probe the processing of sensory information. Based on a large body of experimental evidence, mostly collected in vision, it is now widely accepted that what is consciously perceived is determined by some form of competition between the various possible perceptions. Our novel contribution is to show how the search for patterns in sound sequences, and competition between them, can help to explain the (changing) contents of auditory perceptual awareness. Specifically, we propose that the auditory system is continuously attempting to link incoming sound events to its representations of what has gone before. Each event triggers the start of a new pattern, and probabilistic connections between the event and existing patterns are also formed. In time this results in the formation of representations of all possible repeating patterns that can be extracted from the ongoing sequence. The representation of a pattern is strengthened if it successfully predicts an incoming event, and weakened and eventually eliminated if its prediction fails. Representations that predict the same sound event suppress each other. Thus representations compete, but only if they try to predict the same event at the same time. This means that some patterns are compatible with each other and so are able to occupy our awareness at the same time, with one being perceived in the foreground, and the other(s) as background. On the other hand, patterns that compete with each other are incompatible, and it is difficult, if not impossible, to perceive them simultaneously.

We have built a computational model based on these principles and shown that the model simulates very well how people perceive sound sequences such as those described. The model can predict many aspects of human auditory perceptual organisation, including what perception will typically occur first and how long it will tend to last, the likelihood of other perceptions, the average durations of the different possible perceptions, and the effects of manipulating the timing and feature separation of the tones in the sequence. Although the model is currently expressed in a rather abstract way, it has been formulated with a view towards understanding perceptual processes as they occur in the brain.

One objection that may be raised against this account of perception and perceptual organisation is the following: how can random perceptual switching possibly be correct when our everyday experience is one of relative perceptual stability? We have conducted many experiments to explore this question, and so far it seems that if the sequence continues for long enough, there is no manipulation that will stabilise perception for all time. Even if the first perceptual switch takes a very long time to occur, once switching begins it then continues at a faster rate than the period preceding the initial switch. We therefore suggest that the potential for perceptual switching is always present, but the dynamic nature of the real world makes it unlikely. The benefit of having a perceptual system poised on the verge of instability is that it can never get stuck in one (possibly incorrect) interpretation. Our data suggest that the brain builds many interpretations of the sequence in parallel and we experience each with a probability related to its likelihood. Goal-directed processes are also able to influence perception relatively easily by biasing the competition for perceptual awareness. So far the performance of artificial perceptual systems falls far short of biological systems. Perhaps it is this ability to simultaneously represent a number of different interpretations of the world and to flexibly switch between them that underlies the robustness of natural perception.