5aSCb17 – Pronunciation differences: Gender and ethnicity in Southern English

Wendy Herd – wherd@english.msstate.edu
Devan Torrence – dct74@msstate.edu
Joy Carino – carinoj16@themsms.org

Linguistics Research Laboratory
English Department
Mississippi State University
Mississippi State, MS 39762

Popular version of paper 5aSCb17, “Prevoicing differences in Southern English: Gender and ethnicity effects”
Presented Friday morning, May 27, 10:05 – 12:00 in Salon F
171st ASA Meeting, Salt Lake City

We often notice differences in pronunciation between ourselves and other speakers. More noticeable differences, like the Southern drawl or the New York City pronunciation yuge instead of huge, are even used overtly when we guess where a given speaker is from. Our speech also varies in more subtle ways.

If you hold your hand in front of your mouth when saying tot and dot aloud, you will be able to feel a difference in the onset of vocal fold vibration. Tot begins with a sound that lacks vocal fold vibration, so a large rush of air can be felt on the hand at the beginning of the word. No such rush of air can be felt at the beginning of dot because it begins with a sound with vocal fold vibration. A similar difference can be felt when comparing [p] of pot to [b] of bot and [k] of cot to [ɡ] of got. This difference between [t] and [d] is very noticeable, but the timing of our vocal fold vibration also varies each time we pronounce a different version of [t] or [d].

Our study is particularly focused, not on the large difference between sounds like [t] and [d], but on how speakers produce the smaller differences between different [d] pronunciations. For example, an English [d] might be pronounced with no vocal fold vibration before the [d] as shown in Figure 1(a) or with vocal fold vibration before the [d] as shown in Figure 1(b). As can be heard in the accompanying sound files, the difference between these two [d] pronunciations is less noticeable for English speakers than the difference between [t] and [d].

Pronunciation differences

Figure 1. Spectrogram of (a) dot with no vocal fold vibration before [d] and (b) dot with vocal fold vibration before [d]. (Only the first half of dot is shown.)

We compared the pronunciations of 40 native speakers of English from Mississippi to see if some speakers were more likely to vibrate their vocal folds before [b, d, ɡ] rather than shortly after those sounds. These speakers included equal numbers of African American participants (10 women, 10 men) and Caucasian American participants (10 women, 10 men).

Previous research found that men were more likely to vibrate their vocal folds before [b, d, ɡ] than women, but we found no such gender differences [1]. Men and women from Mississippi employed vocal fold vibration similarly. Instead, we found a clear effect of ethnicity. African American participants produced vocal fold vibration before initial [b, d, ɡ] 87% of the time while Caucasian American participants produced vocal fold vibration before these sounds just 37% of the time. This striking difference, which can be seen in Figure 2, is consistent with a previous smaller study that found ethnicity effects in vocal fold vibration among young adults from Florida [1, 2]. It is also consistent with descriptions of regional variation in vocal fold vibration [3].

Figure 2. Percentage of pronunciations produced with vocal fold vibration before [b, d, ɡ] displayed by ethnicity and gender.

The results suggest that these pronunciation differences are due to dialect variation. African American speakers from Mississippi appear to systematically use vocal fold vibration before [b, d, ɡ] to differentiate them from [p, t, k], but the Caucasian American speakers are using the cue differently and less frequently. Future research in the perception of these sounds could shed light on how speakers of different dialects vary in the way they interpret this cue. For example, if African American speakers are using this cue to differentiate [d] from [t], but Caucasian American speakers are using the same cue to add emphasis or to convey emotion, it is possible that listeners sometimes use these cues to (mis)interpret the speech of others without ever realizing it. We are currently attempting to replicate these results in other regions.

Each accompanying sound file contains two repetitions of the same word. The first repetition does not include fold vibration before the initial sound, and the second repetition does include vocal fold vibration before the initial sound.

  1. Ryalls, J., Zipprer, A., & Baldauff, P. (1997). A preliminary investigation of the effects of gender and race on voice onset time. Journal of Speech Language and Hearing, 40(3), 642-645.
  2. Ryalls, J., Simon, M., & Thomason, J. (2004). Voice onset time production in older Caucasian- and African-Americans. Journal of Multilingual Communication Disorders, 2(1), 61-67.
  3. Jacewicz, E., Fox, R.A., & Lyle, S. (2009). Variation in stop consonant voicing in two regional varieties of American English. Language Variation and Change, 39(3), 313-334.

2aSCb3 – How would you sketch a sound with your hands?

Hugo Scurto – Hugo.Scurto@ircam.fr
Guillaume Lemaitre – Guillaume.Lemaitre@ircam.fr
Jules Françoise – Jules.Francoise@ircam.fr
Patrick Susini – Patrick.Susini@ircam.fr
Frédéric Bevilacqua – Frederic.Bevilacqua@ircam.fr
Ircam
1 place Igor Stravinsky
75004 Paris, France

Popular version of paper 2aSCb3, “Combining gestures and vocalizations to imitate sounds”
Presented Tuesday morning, November 3, 2015, 10:30 AM in Grand Ballroom 8
170th ASA Meeting, Jacksonville

Scurto fig 1 - gestures

Figure 1. A person hears the sound of door squeaking and imitates it with vocalizations and gestures. Can the other person understand what he means?

Have you ever listened to an old Car Talk show? Here is what it sounded like on NPR back in 2010:

“So, when you start it up, what kind of noises does it make?
– It just rattles around for about a minute. […]
– Just like budublu-budublu-budublu?
– Yeah! It’s definitely bouncing off something, and then it stops”

As the example illustrates, it is often very complicated to describe a sound with words. But it is really easy to make it with our built-in sound-making system: the voice! In fact, we have observed earlier that this is exactly what people do: when we ask a person to communicate a sound to another person, she will very quickly try to recreate this noise with her voice – and also use a lot of gestures.

And this works! Communicating sounds with voice and gesture is much more effective than describing them with words and sentences. Imitations of sounds are fun, expressive, spontaneous, widespread in human communication, and very effective. These non-linguistic vocal utterances have been little studied, but nevertheless have the potential to provide researchers with new insights into several important questions in domains such as articulatory phonetics and auditory cognition.

The study we are presenting at this ASA meeting is part of a larger European project on how people imitate sounds with voice and gestures: SkAT-VG (“Sketching Audio Technologies with Voice and Gestures”, http://www.skatvg.eu): How do people produce vocal imitations (phonetics)? What are imitations made of (acoustics and gesture analysis)? How do other people interpret them (psychology)? The ultimate goal is to create “sketching” tools for sound designers (the persons that create the sounds of everyday products). If you are an architect and want to sketch a house, you can simply draw it on a sketchpad. But what do you do if you are a sound designer and want to rapidly sketch the sound of a new motorbike? Well, all that is available today are cumbersome pieces of software. Instead, the Skat-VG project aims to offer sound designers new tools that are as intuitive as a sketching pad: simply use their voice and gestures to control complex sound design tools. Therefore, the SkAT-VG project also conducts research in machine learning, sound synthesis, and studies how sound designers work.

Here at the ASA meeting, we are presenting a partial study in which we asked the question: “What do people use gestures for when they imitate a sound?” In fact, people use a lot of gestures, but we do not know what information these gestures convey: Are they redundant with the voice? Do they convey specific pieces of information that the voice cannot represent?

We first collected a huge database of vocal and gestural imitations. Then, we asked 50 participants to come to our lab and make vocal and gestural imitations for several hours. We recorded their voice, filmed them with a high-speed camera, and used a depth camera and accelerometers to measure their gestures. This resulted in a database of about 8000 imitations! This database is an unprecedented amount of material that now allows

We first analyzed the database qualitatively, by watching and annotating the videos. From this analysis, several hypotheses about the combination of gestures and vocalizations were drawn. Then, to test these hypotheses, we asked 20 participants to imitate 25 specially synthesized sounds with their voice and gestures.

The results showed a quantitative advantage of voice over gesture for communicating rhythmic information. Voice can reproduce accurately higher tempos than gestures, and is more precise than gestures when reproducing complex rhythmic patterns. We also found that people often use gestures in a metaphorical way, whereas voice reproduces some acoustic features of the sound. For instance, people shake their hands very rapidly whenever a sound is stable and noisy. This type of gesture does not really follow a feature of the sound: it simply means that the sound is noisy.

Overall, our study reveals the metaphorical function of gestures during sound imitation. Rather than following an acoustic characteristic, gestures expressively emphasize the vocalization and signal the most salient features. These results will inform the specifications of the SkAT-VG tools and make the tools more intuitive.

2pSCb11 – Effect of Menstrual Cycle Hormone Variations on Dichotic Listening Results

Richard Morris – Richard.morris@cci.fsu.edu
Alissa Smith

Florida State University
Tallahassee, Florida

Popular version of poster presentation 2pSCb11, “Effect of menstrual phase on dichotic listening”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 8

How speech is processed by the brain has long been of interest to researchers and clinicians. One method to evaluate how the two sides of the brain work when hearing speech is called a dichotic listening task. In a dichotic listening task two words are presented simultaneously to a participant’s left and right ears via headphones. One word is presented to the left ear and a different one to the right ear. These words are spoken at the same pitch and loudness levels. The listener then indicates what word was heard. If the listener regularly reports hearing the words presented to one ear, then there is an ear advantage. Since most language processing occurs in the left hemisphere of the brain, most listeners attend more closely to the right ear. The regular selection of the word presented to the right ear is termed a right ear advantage (REA).

Previous researchers reported different responses from males and females to dichotic presentation of words. Those investigators found that males more consistently heard the word presented to the right ear and demonstrated a stronger REA. The female listeners in those studies exhibited more variability as to the ear of the word that was heard. Further research seemed to indicate that women exhibit different lateralization of speech processing at different phases of their menstrual cycle. In addition, data from recent studies indicate that the degree to which women can focus on the input to one ear or the other varies with their menstrual cycle.

However, the previous studies used a small number of participants. The purpose of the present study was to complete a dichotic listening study with a larger sample of female participants. In addition, the previous studies focused on women who did not take oral contraceptives as they were assumed to have smaller shifts in the lateralization of speech processing. Although this hypothesis is reasonable, it needs to be tested. For this study, it was hypothesized that the women would exhibit a greater REA during the days that they menstruate than during other days of their menstrual cycle. This hypothesis was based on the previous research reports. In addition, it was hypothesized that the women taking oral contraceptives will exhibit smaller fluctuations in the lateralization of their speech processing.

Participants in the study were 64 females, 19-25 years of age. Among the women 41 were taking oral contraceptives (OC) and 23 were not. The participants listened to the sound files during nine sessions that occurred once per week. All of the women were in good general health and had no speech, language, or hearing deficits.

The dichotic listening task was executed using the Alvin software package for speech perception research. The sound file consisted of consonant-vowel syllables comprised of the six plosive consonants /b/, /d/, /g/, /p/, /t/, and /k/ paired with the vowel “ah”. The listeners heard the syllables over stereo headphones. Each listener set the loudness of the syllables to a comfortable level.

At the beginning of the listening session, each participant wrote down the date of the initiation of her most recent menstrual period on a participant sheet identified by her participant number. Then, they heard the recorded syllables and indicated the consonant heard by striking that key on the computer keyboard. Each listening session consisted of three presentations of the syllables. There were different randomizations of the syllables for each presentation. In the first presentation, the stimuli will be presented in a non-forced condition. In this condition the listener indicted the plosive that she heard most clearly. After the first presentation, the experimental files were presented in a manner referred to as a forced left or right condition. In these two conditions the participant was directed to focus on the signal in the left or right ear. The sequence of focus on signal to the left ear or to the right ear was counterbalanced over the sessions.

The statistical analyses of the listeners’ responses revealed that no significant differences occurred between the women using oral contraceptives and those who did not. In addition, correlations between the day of the women’s menstrual cycle and their responses were consistently low. However, some patterns did emerge for the women’s responses across the experimental sessions as opposed to the days of their menstrual cycle. The participants in both groups exhibited a higher REA and lower percentage of errors for the final sessions in comparison to earlier sessions.

The results from the current subjects differ from those previously reported. Possibly the larger sample size of the current study, the additional month of data collection, or the data recording method affected the results. The larger sample size might have better represented how most women respond to dichotic listening tasks. The additional month of data collection may have allowed the women to learn how to respond to the task and then respond in a more consistent manner. The short data collection period may have confused the learning to respond to a novel task with a hormonally dependent response. Finally, previous studies had the experimenter record the subjects’ responses. That method of data recording may have added bias to the data collection. Further studies with large data sets and multiple months of data collection are needed to determine any sex and oral contraceptive use effects on REA.

4aSC2 – Effects of language and music experience on speech perception

T. Christina Zhao — zhaotc@uw.edu
Patricia K. Kuhl — pkkuhl@uw.edu
Institute for Learning & Brain Sciences
University of Washington, BOX 357988
Seattle, WA, 98195

Popular version of paper 4aSC2, “Top-down linguistic categories dominate over bottom-up acoustics in lexical tone processing”
Presented Thursday morning, May 21st, 2015, 8:00 AM, Ballroom 2
169th ASA Meeting, Pittsburgh

Speech perception involves constant interplay between top-down and bottom-up processing. For example, to process phonemes (e.g. ‘b’ from ‘p’), the listener must accurately process the acoustical information in the speech signals (i.e. bottom-up strategy) and assign these sounds efficiently to a category (i.e. top-down strategy). Listeners’ performance in speech perception tasks is influenced by their experience in either processing strategy. Here, we use lexical tone processing as a window to examine how extensive experience in both strategies influence speech perception.

Lexical tones are contrastive pitch contour patterns at the word level. That is, a small difference in the pitch contour can result in different word meaning. Native speakers of a tonal language thus have extensive experience in using the top-down strategy to assign highly variable pitch contours into lexical tone categories. This top-down influence is reflected by the reduced sensitivity to acoustic differences within a phonemic category compared to across categories (Halle, Chang, & Best, 2004). On the other hand, individuals with extensive music training early in life exhibit enhanced sensitivities to pitch differences not only in music, but also in speech, reflecting stronger bottom-up influence. Such bottom-up influence is reflected by the enhanced sensitivity in detecting differences between lexical tones when the listeners are non-tonal language speakers (Wong, Skoe, Russo, Dees, & Kraus, 2007).
How does extensive experience in both strategies influence lexical tone processing? To address this question, native Mandarin speakers with extensive music training (N=17) completed a music pitch discrimination task and a lexical tone discrimination task. We compared their performance with individuals with extensive experience in only one of the processing strategies (i.e. Mandarin nonmusicians (N=20) and English musicians (N=20), data from Zhao & Kuhl (2015)).

Despite the enhanced performance in the music pitch discrimination task in Mandarin musicians, their performance in the lexical tone discrimination task is similar to the performance of the Mandarin nonmusicians, and different from the English musicians’ performance (Fig. 1, ‘Sensitivity across lexical tone continuum by group’).
ZhaoFig1
That is, they exhibited reduced sensitivities within phonemic categories (i.e. on either end of the line) compared to within categories (i.e. the middle of the line), and their overall performance is lower than the English musicians. This result strongly suggests a dominant effect of the top-down influence in processing lexical tone. Yet, further analyses revealed that Mandarin musicians and Mandarin nonmusicians may still be relying on different underlying mechanisms for performing in the lexical tone discrimination task. In the Mandarin musician, their music pitch discrimination scores are correlated with their lexical tone discrimination scores, suggesting a contribution of the bottom-up strategy in their lexical tone discrimination performance (Fig. 2, ‘Music pitch and lexical tone discrimination’, purple). This relation is similar to the English musicians (Fig. 2, peach) but very different from the Mandarin non-musicians (Fig. 2, yellow). Specifically, for Mandarin nonmusicians, the music pitch discrimination scores do not correlate with the lexical tone discrimination scores, suggesting independent processes.

ZhaoFig2

Halle, P. A., Chang, Y. C., & Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32(3), 395-421. doi: 10.1016/s0095-4470(03)00016-0
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci., 10(4), 420-422. doi: 10.1038/nn1872
Zhao, T. C., & Kuhl, P. K. (2015). Effect of musical experience on learning lexical tone categories. The Journal of the Acoustical Society of America, 137(3), 1452-1463. doi: doi:http://dx.doi.org/10.1121/1.4913457

2aSC – Speech: An eye and ear affair!

Pamela Trudeau-Fisette – ptrudeaufisette@gmail.com
Lucie Ménard – menard.lucie@uqam.ca
Université du Quebec à Montréal
320 Ste-Catherine E.
Montréal, H3C 3P8

Popular version of poster session 2aSC, “Auditory feedback perturbation of vowel production: A comparative study of congenitally blind speakers and sighted speakers”
Presented Tuesday morning, May 19, 2015, Ballroom 2, 8:00 AM – 12:00 noon
169th ASA Meeting, Pittsburgh
———————————
When learning to speak, young infants and toddlers use auditory and visual cues to correctly associate speech movements to a specific speech sound. In doing so, typically developing children compare their own speech and those of their ambient language to build and improve the relationship between what they hear, see and feel, and how to produce it.

In many day-to-day situations, we exploit the multimodal nature of speech: in noisy environments, for instance like in a cocktail party, we look at our interlocutor’s face and use lip reading to recover speech sounds. When speaking clearly, we open our mouth wider to make ourself sound more intelligible. Sometimes, just seeing someone’s face is enough to communicate!

What happens in cases of congenital blindness? Despite the fact that blind speakers learn to produce intelligible speech, they do not quite speak like sighted speakers do. Since they do not perceive others’ visual cues, blind speakers do not produce visible labial movements as much as their sighted peers do.

Production of the French vowel “ou” (similar as in cool) produced by a sighted adult speaker (on the left) and a congenitally blind adult speaker (on the right). We can clearly see that the articulatory movements of the lips are more explicit for the sighted speaker.

Therefore, blind speakers put more weight on what they hear (auditory feedback) than sighted speakers, because one sensory input is lacking. How does that affect the way blind individuals speak?

To answer this question, we conducted an experiment during which we asked congenitally blind adult speakers and sighted adult speakers to produce multiple repetitions of the French vowel “eu”. While they were producing the 130 utterances, we gradually altered their auditory feedback through headphones – without them knowing it- so that they were not hearing the exact sound they were saying. Consequently, they needed to modify the way they produced the vowel in order to compensate for the acoustic manipulation, so they could hear the vowel they were asked to produce (and the one they thought they were saying all along!).

What we were interested in is whether blind speakers and sighted speakers would react differently to this auditory manipulation. The blind speakers not being able to rely on visual feedback, we hypothesized that they would grant more importance on their auditory feedback and, therefore, compensate to a greater extent for the acoustic manipulation.

To explore this matter, we observed the acoustic (produced sounds) and articulatory (lips and tongue movements) differences between the two groups at three distinct time points of the experiment phases.

As predicted, congenitally blind speakers compensated for the altered auditory feedback in a greater extent than their sighted peers. More specifically, even though both speaker groups adapted their productions, the blind group compensated more than the control group did, as if they were integrating the auditory information more strongly. Also, we found that both speaker groups used different articulatory strategies to respond to the applied manipulation: blind participants used their tongue (which is not visible when you speak) more to compensate. This latter observation is not surprising considering the fact that blind speakers do not use their lips (which is visible when you speak) as much as their sighted peers do.

Tags: speech, language, learning, vision, blindness

1pSC2 – Deciding to go (or not to go) to the party may depend as much on your memory as on your hearing

Kathy Pichora-Fuller – k.pichora.fuller@utoronto.ca
Department of Psychology, University of Toronto,
3359 Mississauga Road,
Mississauga, Ontario, CANADA L5L 1C6

Sherri Smith – Sherri.Smith@va.gov
Audiologic Rehabilitation Laboratory, Veterans Affairs Medical Center,
Mountain Home, Tennessee, UNITED STATES 37684

Popular version of paper 1pSC2 Effects of age, hearing loss and linguistic complexity on listening effort as measured by working memory span
Presented Monday afternoon, May 18, 2015 (Session: Listening Effort II)
169th ASA Meeting, Pittsburgh

Understanding conversation in noisy everyday situations can be a challenge for listeners, especially individuals who are older and/or hard-of-hearing. Listening in some everyday situations (e.g., at dinner parties) can be so challenging that people might even decide that they would rather stay home than go out. Eventually, avoiding these situations can damage relationships with family and friends and reduce enjoyment of and participation in activities. What are the reasons for these difficulties and why are some people affected more than other people?

How easy or challenging it is to listen may vary from person to person because some people have better hearing abilities and/or cognitive abilities compared to other people. The hearing abilities of some people may be affected by the degree or type of their hearing loss. The cognitive abilities of some people, for example how well they can attend to and remember what they have heard, can also affect how easy it is for them to follow conversation in challenging listening situations. In addition to hearing abilities, cognitive abilities seem to be particularly relevant because in many everyday listening situations people need to listen to more than one person talking at the same time and/or they may need to listen while doing something else such as driving a car or crossing a busy street. The auditory demands that a listener faces in a situation increase as background noise becomes louder or as more interfering sounds combine with each other. The cognitive demands in a situation increase when listeners need to keep track of more people talking or to divide their attention as they try to do more tasks at the same time. Both auditory and cognitive demands could result in the situation becoming very challenging and these demands may even totally overload a listener.

One way to measure information overload is to see how much a person remembers after they have completed a set of tasks. For several decades, cognitive psychologists have been interested in ‘working memory’, or a person’s limited capacity to process information while doing tasks and to remember information after the tasks have been completed. Like a bank account, the more cognitive capacity is spent on processing information while doing tasks, the less cognitive capacity will remain available for remembering and using the information later. Importantly, some people have bigger working memories than other people and people who have a bigger working memory are usually better at understanding written and spoken language. Indeed, many researchers have measured working memory span for reading (i.e., a task involving the processing and recall of visual information) to minimize ‘contamination’ from the effects of hearing loss that might be a problem if they measured working memory span for listening. However, variations in difficulty due to hearing loss may be critically important in assessing how the demands of listening affect different individuals when they are trying to understand speech in noise. Some researchers have studied the effects of the acoustical properties of speech and interfering noises on listening, but less is known about how variations in the type of language materials (words, sentences, stories) might alter listening demands for people who have hearing loss. Therefore, to learn more about why some people cope better when listening to conversation in noise, we need to discover how both their auditory and their cognitive abilities come into play during everyday listening for a range of spoken materials.

We predicted that speech understanding would be more highly associated with working memory span for listening than with listening span for reading, especially when more realistic language materials are used to measure speech understanding. To test these predictions, we conducted listening and reading tests of working memory and we also measured memory abilities using five other measures (three auditory memory tests and two visual memory tests). Speech understanding was measured with six tests (two tests with words, one in quiet and one in noise; three tests with sentences, one in quiet and two in noise; one test with stories in quiet). The tests of speech understanding using words and sentences were selected from typical clinical tests and involved simple immediate repetition of the words or sentences that were heard. The test using stories has been used in laboratory research and involved comprehension questions after the end of the story. Three groups with 24 people in each group were tested: one group of younger adults (mean age = 23.5 years) with normal hearing and two groups of older adults with hearing loss (one group with mean age = 66.3 years and the other group with mean age 74.3 years).

There was a wide range in performance on the listening test of working memory, but performance on the reading test of working memory was more limited and poorer. Overall, there was a significant correlation between the results on the reading and listening working memory measures. However, when correlations were conducted for each of the three groups separately, the correlation reached significance only for the oldest listeners with hearing loss; this group had lower mean scores on both tests. Surprisingly, for all three groups, there were no significant correlations among the working memory and speech understanding measures. To further investigate this surprising result, a factor analysis was conducted. The results of the factor analysis suggest that there was one factor including age, hearing test results and performance on speech understanding measures when the speech-understanding task was simply to repeat words or sentences – these seem to reflect auditory abilities. In addition, separate factors were found for performance on the speech understanding measures involving the comprehension of discourse or the use of semantic context in sentences – these seem to reflect linguistic abilities. Importantly, the majority of the memory measures were distinct from both kinds of speech understanding measures, and also a more basic and less cognitively demanding memory measure involving only the repetition of sets of numbers. Taken together, these findings suggest that working memory measures reflect differences between people in cognitive abilities that are distinct from those tapped by the sorts of simple measures of hearing and speech understanding that have been used in the clinic. Above and beyond current clinical tests, by testing working memory, especially listening working memory, useful information could be gained about why some people cope better than others in everyday challenging listening situations.

tags: age, hearing, memory, linguistics, speech