speech perception | Acoustics.org

Can aliens found in museums teach us about learning sound categories?

Christopher Heffner – ccheffne@buffalo.edu

Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, 14214, United States

Popular version of 4aSCb6 – Age and category structure in phonetic category learning
Presented at the 186th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0027460

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–

Imagine being a native English speaker learning to speak French for the first time. You’ll have to do a lot of learning, including learning new ways to fit words together to form sentences and a new set of words. Beyond that, though, you must also learn to tell sounds apart that you’re not used to. Even the French word for “sound”, son, is different from the word for “bucket”, seau, in a way that English speakers don’t usually pay attention to. How do you manage to learn to tell these sounds apart when you’re listening to others? You need to group those sounds into categories. In this study, museum and library visitors interacting with aliens in a simple game helped us to understand which categories that people might find harder to learn. The visitors were of many different ages, which allowed us to see how this might change as we get older.

One thing that might help would be if you come with knowledge that certain types of categories are impossible. If you’re in a new city trying to choose a restaurant, it can be really daunting if you decide to investigate every single restaurant in the city. The decision becomes less overwhelming if you narrow yourself to a specific cuisine or neighborhood. Similarly, if you’re learning a new language, it might be very difficult if you entertain every possible category, but limiting yourself to certain options might help. My previous research (Heffner et al., 2019) indicated that learners might start the language learning process with biases against complicated categories, like ones that you need the word “or” to describe. I can describe a day as uncomfortable in its temperature if it is too hot or too cold. We compared these complicated categories to simple ones and saw that the complicated ones were hard to learn.

In this study, I studied this sort of bias across lots of different ages. Brains change as we grow into adulthood and continue to change as we grow older. I was curious whether the bias we have against those certain complicated categories would shift with age, too. To study this, I enlisted visitors to a variety of community sites, by way of partnerships with, among others, the Buffalo Museum of Science, the Rochester Museum and Science Center, and the West Seneca Public Library, all located in Western New York. My lab brought portable equipment to those sites and recruited visitors. The visitors were able to learn about acoustics, a branch of science they had probably not heard much about before; the community spaces got a cool, interactive activity for their guests; and we as the scientists got access to a broader population than we could get sitting inside the university.

Figure 1. The three aliens that my participants got to know over the course of the experiment. Each alien made a different combination of sounds, or no sounds at all.

We told the visitors that they were park rangers in Neptune’s first national park. They had to learn which aliens in the park made which sounds. The visitors didn’t know that the sounds they were hearing were taken from German. Over the course of the experiment, they learned to group sounds together according to categories that we made up in the German speech sounds. What we found is that learning of simple and complicated categories was different across ages. Nobody liked the complicated categories. Everyone, no matter their age, found them difficult to learn. However, the responses to the simple categories differed a lot depending on the age. Kids found them very difficult, too, but learning got easier for the teens. Learning peaked in young adulthood, then was a bit harder for those in older age. This suggests that the brain systems that help us learn simple categories might change over time, while everyone seems to have the bias against the complicated categories.

Figure 2. A graph, created by me, showing how accurate people were at matching the sounds they heard with aliens. There are three pairs of bars, and within each pair, the red bars (on the right) show the accuracy for the simple categories, while the blue bars (on the left) show the accuracy for the complicated categories. The left two bars show participants aged 7-17, the middle two bars show participants aged 18-39, and the right two show participants aged 40 and up. Note that the simple categories are easier than the complicated ones for participants above 18, while for those younger than 18, there is no difference between the categories.

4pSC11 – The role of familiarity in audiovisual speech perception

Chao-Yang Lee – leec1@ohio.edu
Margaret Harrison – mh806711@ohio.edu
Ohio University
Grover Center W225
Athens, OH 45701

Seth Wiener – sethw1@cmu.edu
Carnegie Mellon University
160 Baker Hall, 5000 Forbes Avenue
Pittsburgh, PA 15213

Popular version of paper 4pSC11, “The role of familiarity in audiovisual speech perception”
Presented Thursday afternoon, December 7, 2017, 1:00-4:00 PM, Studios Foyer
174^th ASA Meeting, New Orleans

When we listen to someone talk, we hear not only the content of the spoken message, but also the speaker’s voice carrying the message. Although understanding content does not require identifying a specific speaker’s voice, familiarity with a speaker has been shown to facilitate speech perception (Nygaard & Pisoni, 1998) and spoken word recognition (Lee & Zhang, 2017).

Because we often communicate with a visible speaker, what we hear is also affected by what we see. This is famously demonstrated by the McGurk effect (McGurk & MacDonald, 1976). For example, an auditory “ba” paired with a visual “ga” usually elicits a perceived “da” that is not present in the auditory or the visual input.

Since familiarity with a speaker’s voice affects auditory perception, does familiarity with a speaker’s face similarly affect audiovisual perception? Walker, Bruce, and O’Malley (1995) found that familiarity with a speaker reduced the occurrence of the McGurk effect. This finding supports the “unity” assumption of intersensory integration (Welch & Warren, 1980), but challenges the proposal that processing facial speech is independent of processing facial identity (Bruce & Young, 1986; Green, Kuhl, Meltzoff, & Stevens, 1991).

In this study, we explored audiovisual speech perception by investigating how familiarity with a speaker affects the perception of English fricatives “s” and “sh”. These two sounds are useful because they contrast visibly in lip rounding. In particular, the lips are usually protruded for “sh” but not “s”, meaning listeners can potentially identify the contrast based on visual information.

Listeners were asked to watch/listen to stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent (e.g., audio “save” paired with visual “shave”). The listeners’ task was to identify whether the first sound of the stimuli was “s” or “sh”. We tested two groups of native English listeners – one familiar with the speaker who produced the stimuli and one unfamiliar with the speaker.

The results showed that listeners familiar with the speaker identified the fricatives faster in all conditions (Figure 1) and more accurately in the visual-only condition (Figure 2). That is, listeners familiar with the speaker were more efficient in identifying the fricatives overall, and were more accurate when visual input was the only source of information.

We also examined whether visual familiarity affects the occurrence of the McGurk effect. Listeners were asked to identify syllable-initial stops (“b”, “d”, “g”) from stimuli that were audiovisual-congruent or incongruent (e.g., audio “ba” paired with visual “ga”). A blended (McGurk) response was indicated by a “da” response to an auditory “ba” paired with a visual “ga”.

Contrary to the “s”-“sh” findings reported earlier, the results from our identification task showed no difference between the familiar and unfamiliar listeners in the proportion of McGurk responses. This finding did not replicate Walker, Bruce, and O’Malley (1995).

In sum, familiarity with a speaker facilitated the speed of identifying fricatives from audiovisual stimuli. Familiarity also improved the accuracy of fricative identification when visual input was the only source of information. Although we did not find an effect of familiarity on the McGurk responses, our findings from the fricative task suggest that processing audiovisual speech is affected by speaker identity.

Figure 1- Reaction time of fricative identification from stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent. Error bars indicate 95% confidence intervals.

Figure 2- Accuracy of fricative identification (d’) from stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent (e.g., audio “save” paired with visual “shave”). Error bars indicate 95% confidence intervals.

Figure 3- Proportion of McGurk response (“da” response to audio “ba” paired with visual “ga”).

Video 1 – Example of an audiovisual-incongruent stimulus (audio “save” paired with visual “shave”).

Video 2 – Example of an audiovisual-incongruent stimulus (audio “ba” paired with visual “ga”).

References:

Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327.

Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B. (1991). Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics, 50, 524-536.

Lee, C.-Y., & Zhang, Y. (in press). Processing lexical and speaker information in repetition and semantic/associative priming. Journal of Psycholinguistic Research.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 26, 746-748.

Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355-376.

Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception and Psychophysics, 57, 1124-1133.

Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638-667.

4aSC2 – Effects of language and music experience on speech perception

T. Christina Zhao — zhaotc@uw.edu
Patricia K. Kuhl — pkkuhl@uw.edu
Institute for Learning & Brain Sciences
University of Washington, BOX 357988
Seattle, WA, 98195

Popular version of paper 4aSC2, “Top-down linguistic categories dominate over bottom-up acoustics in lexical tone processing”
Presented Thursday morning, May 21st, 2015, 8:00 AM, Ballroom 2
169th ASA Meeting, Pittsburgh

Speech perception involves constant interplay between top-down and bottom-up processing. For example, to process phonemes (e.g. ‘b’ from ‘p’), the listener must accurately process the acoustical information in the speech signals (i.e. bottom-up strategy) and assign these sounds efficiently to a category (i.e. top-down strategy). Listeners’ performance in speech perception tasks is influenced by their experience in either processing strategy. Here, we use lexical tone processing as a window to examine how extensive experience in both strategies influence speech perception.

Lexical tones are contrastive pitch contour patterns at the word level. That is, a small difference in the pitch contour can result in different word meaning. Native speakers of a tonal language thus have extensive experience in using the top-down strategy to assign highly variable pitch contours into lexical tone categories. This top-down influence is reflected by the reduced sensitivity to acoustic differences within a phonemic category compared to across categories (Halle, Chang, & Best, 2004). On the other hand, individuals with extensive music training early in life exhibit enhanced sensitivities to pitch differences not only in music, but also in speech, reflecting stronger bottom-up influence. Such bottom-up influence is reflected by the enhanced sensitivity in detecting differences between lexical tones when the listeners are non-tonal language speakers (Wong, Skoe, Russo, Dees, & Kraus, 2007).
How does extensive experience in both strategies influence lexical tone processing? To address this question, native Mandarin speakers with extensive music training (N=17) completed a music pitch discrimination task and a lexical tone discrimination task. We compared their performance with individuals with extensive experience in only one of the processing strategies (i.e. Mandarin nonmusicians (N=20) and English musicians (N=20), data from Zhao & Kuhl (2015)).

Despite the enhanced performance in the music pitch discrimination task in Mandarin musicians, their performance in the lexical tone discrimination task is similar to the performance of the Mandarin nonmusicians, and different from the English musicians’ performance (Fig. 1, ‘Sensitivity across lexical tone continuum by group’).

That is, they exhibited reduced sensitivities within phonemic categories (i.e. on either end of the line) compared to within categories (i.e. the middle of the line), and their overall performance is lower than the English musicians. This result strongly suggests a dominant effect of the top-down influence in processing lexical tone. Yet, further analyses revealed that Mandarin musicians and Mandarin nonmusicians may still be relying on different underlying mechanisms for performing in the lexical tone discrimination task. In the Mandarin musician, their music pitch discrimination scores are correlated with their lexical tone discrimination scores, suggesting a contribution of the bottom-up strategy in their lexical tone discrimination performance (Fig. 2, ‘Music pitch and lexical tone discrimination’, purple). This relation is similar to the English musicians (Fig. 2, peach) but very different from the Mandarin non-musicians (Fig. 2, yellow). Specifically, for Mandarin nonmusicians, the music pitch discrimination scores do not correlate with the lexical tone discrimination scores, suggesting independent processes.

Halle, P. A., Chang, Y. C., & Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32(3), 395-421. doi: 10.1016/s0095-4470(03)00016-0
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci., 10(4), 420-422. doi: 10.1038/nn1872
Zhao, T. C., & Kuhl, P. K. (2015). Effect of musical experience on learning lexical tone categories. The Journal of the Acoustical Society of America, 137(3), 1452-1463. doi: doi:http://dx.doi.org/10.1121/1.4913457

Can aliens found in museums teach us about learning sound categories?

Figure 1. The three aliens that my participants got to know over the course of the experiment. Each alien made a different combination of sounds, or no sounds at all.

4pSC11 – The role of familiarity in audiovisual speech perception

4aSC2 – Effects of language and music experience on speech perception

Search for papers by Acoustics Keyword