ASA Lay Language Papers
162nd Acoustical Society of America Meeting


Does musical training enhance speech perception? If so, why?

Aniruddh D. Patel – apatel@nsi.edu
The Neurosciences Institute
10640 John Jay Hopkins Drive
San Diego, CA 92121

Popular version of paper 2pMUa1
Presented Tuesday afternoon, Nov 1, 2011
162nd ASA Meeting, San Diego, Calif.

Speech is our primary communication system, and problems with speech perception (such as hearing speech in noise) can be very distressing, especially as we grow older.  Can we do anything to preserve or even enhance our hearing as we grow, apart from avoiding exposure to loud sounds?  Surprisingly, recent research suggests that musical training (learning to play an instrument or sing) sharpens the brain’s sensory processing of speech sounds.  Furthermore, this enhancement is reflected in improved real-world abilities such as hearing speech in noise [1-3].

These findings are exciting, but they also raise a fundamental question. Why would musical training sharpen the brain’s sensory processing of speech?  After all, musical instruments (such as guitars or trumpets) sound very different from human voices.  Furthermore, spoken sentences are acoustically very complex, and many musical instruments produce sound patterns that are simpler by comparison (think of a flute melody vs. human speech).  Why would learning to produce relatively simple acoustic patterns (musical melodies) enhance the brain’s processing of complex acoustic patterns (speech)?

One possible explanation is offered by the OPERA hypothesis [4].  OPERA stands for ‘Overlap, Precision, Repetition, Emotion, and Attention’.  This paper focuses on the ‘precision’ component (the ‘P’ in OPERA), which is the key novel idea of the hypothesis.  The idea is that musical training benefits speech processing because it demands greater precision in certain aspects of auditory processing than does ordinary speech perception. 

To illustrate this idea, consider pitch (the perceived highness or lowness of a sound), which is an important auditory feature of both speech and music [5].  Pitch is distinct from timbre (sound quality).  For example, the English vowels “ee” and “oo” have different timbres but can be spoken on the same pitch.  Conversely, any vowel can be spoken with different pitch heights – think of a small child vs. a grown man saying “ee”.  The auditory system has neural mechanisms for extracting pitch from complex sounds [6], and similar brain circuits are very likely involved in extracting pitch in speech and music, especially in lower brain centers in the brainstem (this satisfies the ‘Overlap’ condition of OPERA).

In terms of perception, pitch plays an important role in both speech and music.  For example, in English and many other languages, questions (such as “Is it your birthday?”) are often marked by a final pitch rise.  Furthermore, in ‘tone languages’ such as Mandarin Chinese, changing the pitch pattern on a word can entirely change its meaning (e.g., from “mother” to “horse”!).  Pitch is also important for musical melodies: if the pitches of a melody are changed slightly, the results can be very salient. (Listen to sound example 1: can you hear the wrong notes?  In physical terms, they are very close in frequency to the correct notes that they replaced). 

Pitch is obviously important for both speech and music.  Does musical training demand greater precision than speech perception in terms of pitch processing?  One way to address this question is to ask “how important is hearing fine pitch distinctions for speech comprehension vs. for musical training?” 

When listening to spoken sentences, if you can’t hear fine distinctions in pitch, the chances are that you will still comprehend the basic meaning of the sentences.  This is because speech has many redundant cues to meaning.  For example, if a listener doesn’t perceive the pitch rise in “Is it your birthday?”, he or she will still know the sentence is a question by virtue of word choice (“Is it…”) and grammar.  Indeed, it was recently  discovered that native speakers of Mandarin Chinese can fully understand the meaning of sentences spoken on a monotone, i.e., sentences in which all pitch variation has been removed [7].  Thus even in this tone language, listeners can infer the identity of spoken words from the remaining acoustic cues.  The larger point is that understanding the basic meaning of sentences does not require ability to hear fine distinctions in pitch, due to the redundancy of spoken language.  Of course, pitch can help convey a variety of information in speech, such as emphasis, boundaries between phrases, emotional tone, and attitude (e.g., serious vs. ironic), but extracting the basic semantic meaning of a sentence does not seem to require high-precision pitch encoding by the brain.

In contrast, musical training does demand high-precision pitch encoding.  For  musicians to tell if they are in tune and playing the right notes, they need to hear subtle distinctions in pitch (recall sound example 1, where the wrong notes were quite close in pitch to the correct notes they replaced).  According to OPERA, music’s high-precision demands on pitch encoding set the stage for musical training to sharpen pitch processing in speech.  The remaining three conditions of the OPERA hypothesis (Emotion, Repetition, and Attention) facilitate neural plasticity, i.e., experience-dependent changes in brain structure. According to OPERA, when all five conditions are met, musical training drives pitch-processing networks to function at a higher level than needed for ordinary speech processing.  Yet, since speech and music share these networks, speech perception benefits.  For example, higher-precision encoding of voice pitch appears to enhance speech perception in noise [2,3,cf. 7].

Of course, there is much more to speech (and music) than pitch, and further work is need to determine what other aspects of speech processing benefit from musical training.  The results of such work can help guide the exploration of music as a tool for enhancing or rehabilitating certain speech perception abilities.

[Sound example 1: courtesy of Dennis Drayna, Ph.D., from the “Distorted Tunes Test”]

References

1. Kraus, N. & B. Chandrasekaran. 2010. Music training for the development of auditory skills. Nat. Rev. Neurosci. 11:599-605.

2. Parbery-Clark, A., Skoe, E. & Kraus, N. 2009. Musical experience limits the degradative effects of background noise on the neural processing of sound. J. Neurosci. 29, 14100–14107.

3. Parbery-Clark, A., D.L. Strait, S. Anderson, E. Hittner, & N. Kraus. 2011.
Musical training and the aging auditory system: implications for cognitive abilities and hearing speech in noise.PLoS ONE 6: e18082. doi: 10.1371/ journal.pone.0018082

4. Patel, A.D. 2011. Why would musical training benefit the neural encoding of speech?  The OPERA hypothesis.  Frontiers in Psychology 2:142. (doi: 10.3389/ fpsyg.2011.00142)

5. Patel, A.D. 2008. Music, Language, and the Brain. Oxford Univ. Press. New York.

6. McDermott, J.H., & A.J. Oxenham. 2008. Music perception, pitch, and the auditory system. Current Opinion in Neurobiology 18: 452–463.

7. Patel, A.D., Y. Xu, & B. Wang. 2010. The role of F0 variation in the intelligibility of Mandarin sentences. Proceedings of Speech Prosody 2010, May 11-14, 2010, Chicago, IL, USA