Listen to the Music: We Rely on Musical Genre to Determine Singers’ Accents

Maddy Walter – maddyw37@student.ubc.ca

The University of British Columbia, Department of Linguistics, Vancouver, British Columbia, V6T 1Z4, Canada

Additional authors:
Sydney Norris, Sabrina Luk, Marcell Maitinsky, Md Jahurul Islam, and Bryan Gick

Popular version of 3pPP6 – The Role of Genre Association in Sung Dialect Categorization
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAFALL24&id=3771321

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Have you ever listened to a song and later been surprised to hear the artist speak with a different accent than the one you heard in the song? Take country singer Keith Urban’s song “What About Me” for instance; when listening, you might assume that he has a Southern American (US) English accent. However, in his interviews, he speaks with an Australian English accent. So why did you think he sounded Southern?

Research suggests that specific accents or dialects are associated with musical genres [2], that singers adjust their accents based on genre [4]; and that foreign accents are more difficult to recognize in songs compared to speech [5]. However, when listeners perceive an accent in a song, it is unclear which type of information they rely on: the acoustic speech information or information about the musical genre. Our previous research investigated this question for Country and Reggae music and found that genre recognition may play a larger role in dialect perception than the actual sound of the voice [9].

Our current study explores American Blues and Folk music, genres that allow for easier separation of vocals from instrumentals, with more refined stimuli manipulation. Blues is strongly associated with African American English [3], while Folk can be associated with a variety of (British, American, etc.) dialects [1]. Participants listened to manipulated clips of sung and “spoken” lines taken from songs in both genres, which were transcribed for participants (see Figure 1). AI applications were used to remove instrumentals for both sung and spoken clips, while “spoken” clips also underwent rhythm and pitch normalization so that they sounded like spoken rather than sung speech. After hearing each sung or spoken line, participants were asked to identify the dialect they heard from six options [7, 8] (see Figure 2).

Figure 1: Participant view of a transcript from a Folk song clip.
Figure 2: Participant view of six dialect options after hearing a clip.

Participants were much more confident and accurate in categorizing accents for clips in the Sung condition, regardless of genre. The proportion of uncertainty (“Not Sure” responses) in the Spoken condition was consistent across genres (see “D” in Figure 3), suggesting that participants were more certain of dialect when musical cues were present. Dialect categories followed genre expectations, as can be seen from the increase in identifying African American English for Blues in the Sung condition (see “A”). Removing uncertainty by adding genre cues did not increase the likelihood of “Irish English” or “British English” being chosen for Blues, though it did for Folk (see “B” and “C” in Figure 3), in line with genre-based expectations.

Figure 3: Participant dialect responses.

These findings enhance our understanding of the relationship between musical genre and accent. Referring again to the example of Keith Urban, the singer’s stylistic accent change may not be the only culprit for our interpretation of a Southern drawl. Rather, we may have assumed we were listening to a musician with a Southern American English Accent when we heard the first banjo-like twang or tuned into iHeartCountry Radio. When we listen to a song and perceive a singer’s accent, we are not only listening to the sounds of their speech, but are also shaping our perception from our expectations of dialect based on the musical genre.

References:

  1. Carrigan, J., Henry L. (2004). Lornell, kip. the NPR curious listener’s guide to american folk music. Library Journal (1976), 129(19), 63.
  2. Coupland, N. (2011). Voice, place and genre in popular song performance. Journal of Sociolinguistics, 15(5), 573–602. https://doi.org/10.1111/j.1467-9841.2011.00514.x.
  3. De Timmerman, Romeo, et al. (2024). The globalization of local indexicalities through music: African‐American English and the blues. Journal of Sociolinguistics, 28(1), 3–25. https://doi.org/10.1111/josl.12616.
  4. Gibson, A. M. (2019). Sociophonetics of popular music: insights from corpus analysis and speech perception experiments [Doctoral dissertation, University of Canterbury]. http://dx.doi.org/10.26021/4007.
  5. Mageau, M., Mekik, C., Sokalski, A., & Toivonen, I. (2019). Detecting foreign accents in song. Phonetica, 76(6), 429–447. https://doi.org/10.1159/000500187.
  6. RStudio. (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA. http://www.rstudio.com/.
  7. Stoet, G. (2010). PsyToolkit – A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096-1104.
  8. Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24-31.
  9. Walter, M., Bengtson, G., Maitinsky, M., Islam, M. J., & Gick, B. (2023). Dialect perception in song versus speech. The Journal of the Acoustical Society of America, 154(4_supplement), A161. https://doi.org/10.1121/10.0023131.

3pSC87 – What the f***? Making sense of expletives in The Wire

Erica Gold – e.gold@hud.ac.uk
Dan McIntyre – d.mcintyre@hud.ac.uk

University of Huddersfield
Queensgate
Huddersfield, HD1 3DH
United Kingdom

Popular version of 3pSC87 – What the f***: An acoustic-pragmatic analysis of meaning in The Wire
Presented Wednesday afternoon, November 30, 2016
172nd ASA Meeting, Honolulu
Click here to read the abstract

The Wire - expletives

In Season one of HBO’s acclaimed crime drama The Wire, Detectives Jimmy McNulty and ‘Bunk’ Moreland are investigating old homicide cases, including the murder of a young woman shot dead in her apartment. McNulty and Bunk visit the scene of the crime to try and figure out exactly how the woman was killed. What makes the scene unusual dramatically is that, engrossed in their investigation, the two detectives communicate with each other using only the word, “fuck” and its variants (e.g. motherfucker, fuckity fuck, etc.). Somehow, using only this vocabulary, McNulty and Bunk are able to communicate in a meaningful way. The scene is absorbing, engaging and even funny, and it leads to a fascinating question for linguists: how is the viewer able to understand what McNulty and Bunk mean when they communicate using such a restricted set of words?

bunk mcnulty

To investigate this, we first looked at what other linguists have discovered about the word fuck. What is clear is that it’s a hugely versatile word that can be used to express a range of attitudes and emotions. On the basis of this research, we came up with a classification scheme which we then used to categorise all the variants of fuck in the scene. Some seemed to convey disbelief and some were used as insults. Some indicated surprise or realization while others functioned to intensify the following word. And some were idiomatic set phrases (e.g. Fuckin’ A!). Our next step was to see whether there was anything in the acoustic properties of the characters’ speech that would allow us to explain why we interpreted the fucks in the way that we did.

The entire conversation between Bunk and McNulty lasts around three minutes and contains a total of 37 fuck productions (i.e. variations of fuck). Due to the variation in the fucks produced, the one clear and consistent segment for each word was the <u> in fuck. Consequently, this became the focus of our study. The <u> in fuck is the same sound you find in the word strut or duck and is represented as /ᴧ/ in the International Phonetic Alphabet. When analysing vowel sounds, such as <u>, we can look at a number of aspects of its production.

In this study, we looked at the quality of the vowel by measuring the first three formants. In phonetics, the term formant refers to acoustic resonances of sound in the vocal tract. The first two formants can tell us if the production sounds more like, “fuck” rather than, “feck” or “fack,” and the third formant gives us information about the voice quality. We also looked at the duration of the <u> being produced, “fuuuuuck” versus “ fuck.”

After measuring each instance, we ran statistical tests to see if there was any relationship between the way in which it was said, and how we categorised its range of meanings. Our results showed that if we accounted for the differences in the vocal tract shapes of the actors playing Bunk and McNulty, the quality of the vowels are relatively consistent. That is, we get a lot of <u> sounds, rather than “eh,” “oo” or “ih.”

The productions of fucks that were associated with the category of realization were found to be very similar to those associated with disbelief. However, disbelief and realization did contrast with those that were used as insults, idiomatic phrases, or functional words. Therefore, it may be more appropriate to classify the meaning into fewer categories – those that signify disbelief or realization, and those that are idiomatic, insults, or functional. It is important to remember, however, that the latter group of three meanings are represented by fewer examples in the scene. Our initial results show that these two broad groups may be distinguished through the length of the vowel – short <u> is more associated with an insult, function, or idiomatic use rather than disbelief or surprise (for which the vowel tends to be longer). In the future, we would also like to analyse the intonation of the productions. See if you can hear the difference between these samples:

Example 1: realization/surprise

Example 2: general expletive which falls under the functional/idiomatic/insult category

Our results shed new light on what for linguists is an old problem: how do we make sense of what people say when speakers so very rarely say exactly what they mean? Experts in pragmatics (the study of how meaning is affected by context) have suggested that we infer meaning when people break conversational norms. In the example from The Wire, it’s clear that the characters are breaking normal communicative conventions. But pragmatic methods of analysis don’t get us very far in explaining how we are able to infer such a range of meaning from such limited vocabulary. Our results confirm that the answer to this question is that meaning is not just conveyed at the lexical and pragmatic level, but at the phonetic level too. It’s not just what we say that’s important, it’s how we fucking say it!

*all photos are from HBO.com

4aSC2 – Effects of language and music experience on speech perception

T. Christina Zhao — zhaotc@uw.edu
Patricia K. Kuhl — pkkuhl@uw.edu
Institute for Learning & Brain Sciences
University of Washington, BOX 357988
Seattle, WA, 98195

Popular version of paper 4aSC2, “Top-down linguistic categories dominate over bottom-up acoustics in lexical tone processing”
Presented Thursday morning, May 21st, 2015, 8:00 AM, Ballroom 2
169th ASA Meeting, Pittsburgh

Speech perception involves constant interplay between top-down and bottom-up processing. For example, to process phonemes (e.g. ‘b’ from ‘p’), the listener must accurately process the acoustical information in the speech signals (i.e. bottom-up strategy) and assign these sounds efficiently to a category (i.e. top-down strategy). Listeners’ performance in speech perception tasks is influenced by their experience in either processing strategy. Here, we use lexical tone processing as a window to examine how extensive experience in both strategies influence speech perception.

Lexical tones are contrastive pitch contour patterns at the word level. That is, a small difference in the pitch contour can result in different word meaning. Native speakers of a tonal language thus have extensive experience in using the top-down strategy to assign highly variable pitch contours into lexical tone categories. This top-down influence is reflected by the reduced sensitivity to acoustic differences within a phonemic category compared to across categories (Halle, Chang, & Best, 2004). On the other hand, individuals with extensive music training early in life exhibit enhanced sensitivities to pitch differences not only in music, but also in speech, reflecting stronger bottom-up influence. Such bottom-up influence is reflected by the enhanced sensitivity in detecting differences between lexical tones when the listeners are non-tonal language speakers (Wong, Skoe, Russo, Dees, & Kraus, 2007).
How does extensive experience in both strategies influence lexical tone processing? To address this question, native Mandarin speakers with extensive music training (N=17) completed a music pitch discrimination task and a lexical tone discrimination task. We compared their performance with individuals with extensive experience in only one of the processing strategies (i.e. Mandarin nonmusicians (N=20) and English musicians (N=20), data from Zhao & Kuhl (2015)).

Despite the enhanced performance in the music pitch discrimination task in Mandarin musicians, their performance in the lexical tone discrimination task is similar to the performance of the Mandarin nonmusicians, and different from the English musicians’ performance (Fig. 1, ‘Sensitivity across lexical tone continuum by group’).
ZhaoFig1
That is, they exhibited reduced sensitivities within phonemic categories (i.e. on either end of the line) compared to within categories (i.e. the middle of the line), and their overall performance is lower than the English musicians. This result strongly suggests a dominant effect of the top-down influence in processing lexical tone. Yet, further analyses revealed that Mandarin musicians and Mandarin nonmusicians may still be relying on different underlying mechanisms for performing in the lexical tone discrimination task. In the Mandarin musician, their music pitch discrimination scores are correlated with their lexical tone discrimination scores, suggesting a contribution of the bottom-up strategy in their lexical tone discrimination performance (Fig. 2, ‘Music pitch and lexical tone discrimination’, purple). This relation is similar to the English musicians (Fig. 2, peach) but very different from the Mandarin non-musicians (Fig. 2, yellow). Specifically, for Mandarin nonmusicians, the music pitch discrimination scores do not correlate with the lexical tone discrimination scores, suggesting independent processes.

ZhaoFig2

Halle, P. A., Chang, Y. C., & Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32(3), 395-421. doi: 10.1016/s0095-4470(03)00016-0
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci., 10(4), 420-422. doi: 10.1038/nn1872
Zhao, T. C., & Kuhl, P. K. (2015). Effect of musical experience on learning lexical tone categories. The Journal of the Acoustical Society of America, 137(3), 1452-1463. doi: doi:http://dx.doi.org/10.1121/1.4913457

1pSC2 – Deciding to go (or not to go) to the party may depend as much on your memory as on your hearing

Kathy Pichora-Fuller – k.pichora.fuller@utoronto.ca
Department of Psychology, University of Toronto,
3359 Mississauga Road,
Mississauga, Ontario, CANADA L5L 1C6

Sherri Smith – Sherri.Smith@va.gov
Audiologic Rehabilitation Laboratory, Veterans Affairs Medical Center,
Mountain Home, Tennessee, UNITED STATES 37684

Popular version of paper 1pSC2 Effects of age, hearing loss and linguistic complexity on listening effort as measured by working memory span
Presented Monday afternoon, May 18, 2015 (Session: Listening Effort II)
169th ASA Meeting, Pittsburgh

Understanding conversation in noisy everyday situations can be a challenge for listeners, especially individuals who are older and/or hard-of-hearing. Listening in some everyday situations (e.g., at dinner parties) can be so challenging that people might even decide that they would rather stay home than go out. Eventually, avoiding these situations can damage relationships with family and friends and reduce enjoyment of and participation in activities. What are the reasons for these difficulties and why are some people affected more than other people?

How easy or challenging it is to listen may vary from person to person because some people have better hearing abilities and/or cognitive abilities compared to other people. The hearing abilities of some people may be affected by the degree or type of their hearing loss. The cognitive abilities of some people, for example how well they can attend to and remember what they have heard, can also affect how easy it is for them to follow conversation in challenging listening situations. In addition to hearing abilities, cognitive abilities seem to be particularly relevant because in many everyday listening situations people need to listen to more than one person talking at the same time and/or they may need to listen while doing something else such as driving a car or crossing a busy street. The auditory demands that a listener faces in a situation increase as background noise becomes louder or as more interfering sounds combine with each other. The cognitive demands in a situation increase when listeners need to keep track of more people talking or to divide their attention as they try to do more tasks at the same time. Both auditory and cognitive demands could result in the situation becoming very challenging and these demands may even totally overload a listener.

One way to measure information overload is to see how much a person remembers after they have completed a set of tasks. For several decades, cognitive psychologists have been interested in ‘working memory’, or a person’s limited capacity to process information while doing tasks and to remember information after the tasks have been completed. Like a bank account, the more cognitive capacity is spent on processing information while doing tasks, the less cognitive capacity will remain available for remembering and using the information later. Importantly, some people have bigger working memories than other people and people who have a bigger working memory are usually better at understanding written and spoken language. Indeed, many researchers have measured working memory span for reading (i.e., a task involving the processing and recall of visual information) to minimize ‘contamination’ from the effects of hearing loss that might be a problem if they measured working memory span for listening. However, variations in difficulty due to hearing loss may be critically important in assessing how the demands of listening affect different individuals when they are trying to understand speech in noise. Some researchers have studied the effects of the acoustical properties of speech and interfering noises on listening, but less is known about how variations in the type of language materials (words, sentences, stories) might alter listening demands for people who have hearing loss. Therefore, to learn more about why some people cope better when listening to conversation in noise, we need to discover how both their auditory and their cognitive abilities come into play during everyday listening for a range of spoken materials.

We predicted that speech understanding would be more highly associated with working memory span for listening than with listening span for reading, especially when more realistic language materials are used to measure speech understanding. To test these predictions, we conducted listening and reading tests of working memory and we also measured memory abilities using five other measures (three auditory memory tests and two visual memory tests). Speech understanding was measured with six tests (two tests with words, one in quiet and one in noise; three tests with sentences, one in quiet and two in noise; one test with stories in quiet). The tests of speech understanding using words and sentences were selected from typical clinical tests and involved simple immediate repetition of the words or sentences that were heard. The test using stories has been used in laboratory research and involved comprehension questions after the end of the story. Three groups with 24 people in each group were tested: one group of younger adults (mean age = 23.5 years) with normal hearing and two groups of older adults with hearing loss (one group with mean age = 66.3 years and the other group with mean age 74.3 years).

There was a wide range in performance on the listening test of working memory, but performance on the reading test of working memory was more limited and poorer. Overall, there was a significant correlation between the results on the reading and listening working memory measures. However, when correlations were conducted for each of the three groups separately, the correlation reached significance only for the oldest listeners with hearing loss; this group had lower mean scores on both tests. Surprisingly, for all three groups, there were no significant correlations among the working memory and speech understanding measures. To further investigate this surprising result, a factor analysis was conducted. The results of the factor analysis suggest that there was one factor including age, hearing test results and performance on speech understanding measures when the speech-understanding task was simply to repeat words or sentences – these seem to reflect auditory abilities. In addition, separate factors were found for performance on the speech understanding measures involving the comprehension of discourse or the use of semantic context in sentences – these seem to reflect linguistic abilities. Importantly, the majority of the memory measures were distinct from both kinds of speech understanding measures, and also a more basic and less cognitively demanding memory measure involving only the repetition of sets of numbers. Taken together, these findings suggest that working memory measures reflect differences between people in cognitive abilities that are distinct from those tapped by the sorts of simple measures of hearing and speech understanding that have been used in the clinic. Above and beyond current clinical tests, by testing working memory, especially listening working memory, useful information could be gained about why some people cope better than others in everyday challenging listening situations.

tags: age, hearing, memory, linguistics, speech