2aMU5 – Do people find vocal fry in popular music expressive?

Mackenzie Parrott – mackenzie.lanae@gmail.com
John Nix – john.nix@utsa.edu

Popular version of paper 2aMU5, “Listener Ratings of Singer Expressivity in Musical Performance.”
Presented Tuesday, May 24, 2016, 10:20-10:35 am, Salon B/C, ASA meeting, Salt Lake City

Vocal fry is the lowest register of the human voice.  Its distinct sound is characterized by a low rumble interspersed with uneven popping and crackling.  The use of fry as a vocal mannerism is becoming increasingly common in American speech, fueling discussion about the implications of its use and how listeners perceive the speaker [1].  Previous studies have suggested that listeners find vocal fry to be generally unpleasant in women’s speech, but associate it with positive characteristics in men’s speech [2].

As it has become more prevalent, fry has perhaps not surprisingly found its place in many commercial song styles as well.  Many singers are implementing fry as a stylistic device at the onset or offset of a sung tone.  This can be found very readily in popular musical styles, presumably to impact and amplify the emotion that the performer is attempting to convey.

Researchers at the University of Texas at San Antonio conducted a survey to analyze whether listeners’ ratings of a singer’s expressivity in musical samples in two contemporary commercial styles (pop and country) were affected by the presence of vocal fry, and to see if there was a difference in listener ratings according to the singer’s gender.  A male and a female singer recorded musical samples for the study in a noise reduction booth.  As can be seen in the table below, the singers were asked to sing most of the musical selections twice, once using vocal fry at phrase onsets, and once without fry, while maintaining the same vocal quality, tempo, dynamics, and stylization.  Some samples were presented more than one time in the survey portion of the study to test listener reliability.

Song Singer Gender Vocal Mode
(Hit Me) Baby One More Time Female Fry Only
If I Die Young Female With and Without Fry
National Anthem Female With and Without Fry
Thinking Out Loud Male Without Fry Only
Amarillo By Morning Male With and Without Fry
National Anthem Male With and Without Fry

Across all listener ratings of all the songs, the recordings which included vocal fry were rated as being only slightly more expressive than the recordings which contained no vocal fry.  When comparing the use of fry between the male and female singer, there were some differences between the genders.  The listeners rated the samples where the female singer used vocal fry higher (e.g., more expressive) than those without fry, which was surprising considering the negative association with women using vocal fry in speech.  Conversely, the listeners rated the male samples without fry as being more expressive than those with fry. Part of this preference pattern may have also been an indication of the singer; the male singer was much more experienced with pop styles than the female singer, who is primarily classically trained.  The overall expressivity ratings for the male singer were higher than those of the female singer by a statistically significant margin.

There were also listener rating trends between the differing age groups of participants.  Younger listeners drove the gap of preference between the female singer’s performances with fry versus non-fry and the male singer’s performances without fry versus with fry further apart.  Presumably they are more tuned into stylistic norms of current pop singers.  However, this could also imply a gender bias in younger listeners.  The older listener groups rated the mean expressivity of the performers as being lower than the younger listener groups.  Since most of the songs that we sampled are fairly recent in production, this may indicate a generational trend in preference.  Perhaps listeners rate the style of vocal production that is most similar to what they listened to during their young adult years as the most expressive style of singing. These findings have raised many questions for further studies about vocal fry in pop and country music.

 

  1. Anderson, R.C., Klofstad, C.A., Mayew, W.J., Venkatachalam, M. “Vocal Fry May Undermine the Success of Young Women in the Labor Market. “ PLoS ONE, 2014. 9(5): e97506. doi:10.1371/journal.pone.0097506.
  2. Yuasa, I. P. “Creaky Voice: A New Feminine Voice Quality for Young Urban-Oriented Upwardly Mobile American Women.” American Speech, 2010. 85(3): 315-337.

1aPP44 – What’s That Noise? The Effect of Hearing Loss and Tinnitus on Soldiers Using Military Headsets

Candice Manning, AuD, PhD – Candice.Manning@va.gov
Timothy Mermagen, BS – timothy.j.mermagen.civ@mail.mil
Angelique Scharine, PhD – angelique.s.scharine.civ@mail.mil

Human and Intelligent Agent Integration Branch (HIAI)
Human Research and Engineering Directorate
U.S. Army Research Laboratory
Building 520
Aberdeen Proving Ground, MD

Lay language paper 1aPP44, “Speech recognition performance of listeners with normal hearing, sensorineural hearing loss, and sensorineural hearing loss and bothersome tinnitus when using air and bone conduction communication headsets”
Presented Monday Morning, May 23, 2016, 8:00 – 12:00, Salon E/F
171st ASA Meeting, Salt Lake City

Military personnel are at high risk for noise-induced hearing loss due to the unprecedented proportion of blast-related acoustic trauma experienced during deployment from high-level impulsive and continuous noise (i.e., transportation vehicles, weaponry, blast-exposure).  In fact, noise-induced hearing loss is the primary injury of United States Soldiers returning from Afghanistan and Iraq.  Ear injuries, including tympanic membrane perforation, hearing loss, and tinnitus, greatly affect a Soldier’s hearing acuity and, as a result, reduce situational awareness and readiness.  Hearing protection devices are accessible to military personnel; however, it has been noted that many troops forego the use of protection believing it may decrease circumstantial responsiveness during combat.

Noise-induced hearing loss is highly associated with tinnitus, the experience of perceiving sound that is not produced by a source outside of the body.  Chronic tinnitus causes functional impairment that may result in a tinnitus sufferer to seek help from an audiologist or other healthcare professional.  Intervention and management are the only options for those individuals suffering from chronic tinnitus as there is no cure for this condition.  Tinnitus affects every aspect of an individual’s life including sleep, daily tasks, relaxation, and conversation to name only a few.  In 2011, the United States Government Accountability Office report on noise indicated that tinnitus was the most prevalent service-connected disability.  The combination of noise-induced hearing loss and the perception of tinnitus could greatly impact a Soldier’s ability to rapidly and accurately process speech information under high-stress situations.

The prevalence of hearing loss and tinnitus within the military population suggests that Soldier use of hearing protection is extremely important. The addition of hearing protection into reliable communication devices will increase the probability of use among Soldiers.  Military communication devices using air and bone-conduction provide clear two-way audio communications through a headset and a microphone.

Air conduction headsets offer passive hearing protection from high ambient noise, and talk-through microphones allow the user to engage in face-to-face conversation and hear ambient environmental sounds, preserving situation awareness.  Bone-conduction technology utilizes the bone-conduction pathway and presents auditory information differently than air-conduction devices (see Figure 1).  Because headsets with bone conduction transducers do not cover the ears, they allow the user to hear the surrounding environment and the option to communicate over a radio network.  Worn with or without hearing protection, bone conduction devices are inconspicuous and fit easily under the helmet.   Bone conduction communication devices have been used in the past; however, as newer devices have been designed, they have not been widely adopted for military applications.

Manning1a - headsetsA. Manning1b - headsetsB.

Figure 1. Air and Bone conduction headsets used during study: a) Invisio X5 dual in-ear headset and X50 control unit and b) Aftershockz Sports 2 headset.

Since many military personnel operate in high noise environments and with some degree of noise induced hearing damage and/or tinnitus, it is important to understand how speech recognition performance might be altered as a function of headset use.  This is an important subject to evaluate as there are two auditory pathways (i.e., air-conduction pathway and bone-conduction pathway) that are responsible for hearing perception.  Comparing the differences between the air and bone-conduction devices on different hearing populations will help to describe the overall effects of not only hearing loss, an extremely common disability within the military population, but the effect of tinnitus on situational awareness as well.  Additionally, if there are differences between the two types of headsets, this information will help to guide future communication device selection for each type of population (NH vs. SNHL vs. SNHL/Tinnitus).

Based on findings from speech understanding in noise literature, communication devices do have a negative effect on speech intelligibility within the military population when noise is present.  However, it is uncertain as to how hearing loss and/or tinnitus effects speech intelligibility and situational awareness under high-level noise environments.  This study looked at speech recognition of words presented over AC and BC headsets and measured three groups of listeners: Normal Hearing, sensorineural hearing impaired, and/or tinnitus sufferers. Three levels of speech-to-noise (SNR=0,-6,-12) were created by embedding speech items in pink noise.  Overall, performance was marginally, but significantly better for the Aftershockz bone conduction headset (Figure 2).  As would be expected, performance increases as the speech to noise ratio increases (Figure 3).

Manning2

Figure 2. Mean rationalized arcsine units measured for each of the TCAPS under test.

Manning3

Figure 3. Mean rationalized arcsine units measured as a function of speech to noise ratio.

One of the most fascinating things about the data is that although the effect of hearing profile was significant, it was not practically so, the means for the Normal Hearing, Hearing Loss and Tinnitus groups were 65, 61, and 63, respectively (Figure 4).  Nor was there any interaction with any of the other variables under test.  One might conclude from the data that if the listener can control the level of presentation, the speech to noise ratio has about the same effect, regardless of hearing loss. There was no difference in performance with the TCAPS due to one’s hearing profile; however, the Aftershockz headset provided better speech intelligibility for all listeners.

Manning4

Figure 4. Mean rationalized arcsine units observed as a function of the hearing profile of the listener.

3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount?

Eric M. Johnson – eric.martin.johnson@utah.edu
Sarah Hargus Ferguson – sarah.ferguson@hsc.utah.edu

Department of Communication Sciences and Disorders
University of Utah
390 South 1530 East, Room 1201
Salt Lake City, UT 84112

Popular version of poster 3pSC10, “Gender and rate effects on speech intelligibility.”
Presented Wednesday afternoon, May 25, 2016, 1:00, Salon G
171st ASA Meeting, Salt Lake City

Older adults seeking hearing help often report having an especially hard time understanding women’s voices. However, this anecdotal observation doesn’t always agree with the findings from scientific studies. For example, Ferguson (2012) found that male and female talkers were equally intelligible for older adults with hearing loss. Moreover, several studies have found that young people with normal hearing actually understand women’s voices better than men’s voices (e.g. Bradlow et al., 1996; Ferguson, 2004). In contrast, Larsby et al. (2015) found that, when listening in background noise, groups of listeners with and without hearing loss were better at understanding a man’s voice than a woman’s voice. The Larsby et al. data suggest that female speech might be more affected by distortion like background noise than male speech is, which could explain why women’s voices may be harder to understand for some people.

We were interested to see if another type of distortion, speeding up the speech, would have an equal effect on the intelligibility of men and women. Speech that has been sped up (or time-compressed) has been shown to be less intelligible than unprocessed speech (e.g. Gordon-Salant & Friedman, 2011), but no studies have explored whether time compression causes an equal loss of intelligibility for male and female talkers. If an increase in playback speed causes women’s speech to be less intelligible than men’s, it could reveal another possible reason why so many older adults with hearing loss report difficulty understanding women’s voices. To this end, our study tested whether the intelligibility of time-compressed speech decreases for female talkers more than it does for male talkers.

Using 32 listeners with normal hearing, we measured how much the intelligibility of two men and two women went down when the playback speed of their speech was increased by 50%. These four talkers were selected based on their nearly equivalent conversational speaking rates. We used digital recordings of each talker and made two different versions of each sentence they spoke: a normal-speed version and a fast version. The software we used allowed us to speed up the recordings without making them sound high-pitched.

Audio sample 1: A sentence at its original speed.

Audio sample 2: The same sentence sped up to 50% faster than its original speed.

All of the sentences were presented to the listeners in background noise. We found that the men and women were essentially equally intelligible when listeners heard the sentences at their original speed. Speeding up the sentences made all of the talkers harder to understand, but the effect was much greater for the female talkers than the male talkers. In other words, there was a significant interaction between talker gender and playback speed. The results suggest that time-compression has a greater negative effect on the intelligibility of female speech than it does on male speech.

johnson & ferguson fig 1

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.

These results confirm the negative effects of time-compression on speech intelligibility and imply that audiologists should counsel the communication partners of their patients to avoid speaking excessively fast, especially if the patient complains of difficulty understanding women’s voices. This counsel may be even more important for the communication partners of patients who experience particular difficulty understanding speech in noise.

 

  1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20, 255-272.
  2. Ferguson, S. H. (2004). “Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365-2373.
  3. Ferguson, S. H. (2012). “Talker differences in clear and conversational speech: Vowel intelligibility for older adults with hearing loss,” J. Speech Lang. Hear. Res. 55, 779-790.
  4. Gordon-Salant, S., and Friedman, S. A. (2011). “Recognition of rapid speech by blind and sighted older adults,” J. Speech Lang. Hear. Res. 54, 622-631.
  5. Larsby, B., Hällgren, M., Nilsson, L., and McAllister, A. (2015). “The influence of female versus male speakers’ voice on speech recognition thresholds in noise: Effects of low-and high-frequency hearing impairment,” Speech Lang. Hear. 18, 83-90.

1aAA4 – Optimizing the signal to noise ratio in classrooms using passive acoustics

Peter D’Antonio – pdantonio@rpginc.com

RPG Diffusor Systems, Inc.
651 Commerce Dr
Upper Marlboro, MD 20774

Popular version of paper 1aAA4 “Optimizing the signal to noise ratio in classrooms using passive acoustics”
Presented on Monday May 23, 10:20 AM – 5:00 pm, SALON I
171st ASA Meeting, Salt Lake City

The 2012 Program of International Student Assessment (PISA) has carried out an international comparative trial of student performance in reading comprehension, calculus, and natural science. The US ranks 36th out of 64 countries testing ½ million 15 year olds, as shown in Figure 1.

Dantonio1

Figure 1 PISA Study

What is the problem? Existing acoustical designs and products have not evolved to incorporate the current state-of-the-art and the result is schools that are failing to meet their intended goals. Learning areas are only beginning to include adjustable intensity and color lighting, shown to increase reading speeds, reduce testing errors and reduce hyperactivity; existing acoustical designs are limited to conventional absorptive-only acoustical materials, like thin fabric wrapped panels and acoustical ceiling tiles, which cannot address all of the speech intelligibility and music appreciation challenges.

What is the solution? Adopt modern products and designs for core and ancillary learning spaces which utilize binary, ternary, quaternary and other transitional hybrid surfaces, which simultaneously scatter consonant-containing high frequency early reflections and absorb mid-low frequencies to passively improve the signal to noise ratio, adopt recommendations of ANSI 12.6 to control reverberation, background noise and noise intrusion and integrate lighting that adjusts to the task at hand.

Let’s begin by considering how we hear and understand what is being said when information is being delivered via the spoken word. We often hear people say, I can hear what he or she is saying, but I cannot understand what is being said. The understanding of speech is referred to as speech intelligibility. How do we interpret speech? The ear / brain processor can fill in a substantial amount of missing information in music, but requires more detailed information for understanding speech. The speech power is delivered in the vowels (a, e, i, o, u and sometimes y) which are predominantly in the frequency range of 250Hz to 500Hz. The speech intelligibility is delivered in the consonants (b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w), which occur in the 2,000Hz to 6,000 Hz frequency range. People who suffer from noise induced hearing loss typically have a 4,000Hz notch, which causes severe degradation of speech intelligibility. I raise the question, “Why would we want to use exclusively absorption on the entire ceiling of a speech room and thin fabric wrapped panels on a significant proportion of wall areas, when these porous materials absorb these important consonant frequencies and prevents them from fusing with the direct sound making it louder and more intelligible?

Exclusive treatment of absorbing material on the ceiling of the room may excessively reduce the high-frequency consonants sound and result in the masking of high-frequency consonants by low-frequency vowel sounds, thereby reducing the signal to noise ratio (SNR).

The signal has two contributions. The direct line-of-sight sound and the early reflections arriving from the walls, ceiling, floor and people and items in the room. So the signal consists of direct sound and early reflection. Our auditory system, our ears and brain, have a unique ability called temporal fusion, which combines or fuses these two signals into one apparently louder and more intelligible signal. The goal then is to utilize these passive early reflections as efficiently as possible to increase the signal. The denominator in the SNR consists of external noise intrusion, occupant noise, HVAC noise and reverberation. These ideas are summarized in Figure 2.

Dantonio figure2

Figure 2 Signal to Noise Ratio

In Figure 3, we illustrate a concept model for an improved speech environment, whether it is a classroom, a lecture hall, a meeting/conference room, essentially any room in which information is being conveyed.

The design includes a reflective front, because the vertical and horizontal divergence of the consonants is roughly 120 degrees, so if a speaker turns away from the audience, the consonants must reflect from the front wall and ceiling overhead. The perimeter of the ceiling is absorptive to control the reverberation (noise). The center of the ceiling is diffusive to provide early reflections to increase the signal and its coverage in the room. The mid third of the walls utilize novel binary, ternary, quaternary and other transitional diffsorptive (diffusive/absorptive) panels, which scatter the information above 1 kHz (the signal) and absorb the sound below 1 kHz (the reverberation=noise). This design suggests that the current exclusive use of acoustical ceiling tile and traditional fabric wrapped panels is counterproductive in improving the SNR, speech intelligibility and coverage.

Dantonio figure3 - classrooms

Figure 3 Concept model for a classroom with a high SNR

2pSCb11 – Effect of Menstrual Cycle Hormone Variations on Dichotic Listening Results

Richard Morris – Richard.morris@cci.fsu.edu
Alissa Smith

Florida State University
Tallahassee, Florida

Popular version of poster presentation 2pSCb11, “Effect of menstrual phase on dichotic listening”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 8

How speech is processed by the brain has long been of interest to researchers and clinicians. One method to evaluate how the two sides of the brain work when hearing speech is called a dichotic listening task. In a dichotic listening task two words are presented simultaneously to a participant’s left and right ears via headphones. One word is presented to the left ear and a different one to the right ear. These words are spoken at the same pitch and loudness levels. The listener then indicates what word was heard. If the listener regularly reports hearing the words presented to one ear, then there is an ear advantage. Since most language processing occurs in the left hemisphere of the brain, most listeners attend more closely to the right ear. The regular selection of the word presented to the right ear is termed a right ear advantage (REA).

Previous researchers reported different responses from males and females to dichotic presentation of words. Those investigators found that males more consistently heard the word presented to the right ear and demonstrated a stronger REA. The female listeners in those studies exhibited more variability as to the ear of the word that was heard. Further research seemed to indicate that women exhibit different lateralization of speech processing at different phases of their menstrual cycle. In addition, data from recent studies indicate that the degree to which women can focus on the input to one ear or the other varies with their menstrual cycle.

However, the previous studies used a small number of participants. The purpose of the present study was to complete a dichotic listening study with a larger sample of female participants. In addition, the previous studies focused on women who did not take oral contraceptives as they were assumed to have smaller shifts in the lateralization of speech processing. Although this hypothesis is reasonable, it needs to be tested. For this study, it was hypothesized that the women would exhibit a greater REA during the days that they menstruate than during other days of their menstrual cycle. This hypothesis was based on the previous research reports. In addition, it was hypothesized that the women taking oral contraceptives will exhibit smaller fluctuations in the lateralization of their speech processing.

Participants in the study were 64 females, 19-25 years of age. Among the women 41 were taking oral contraceptives (OC) and 23 were not. The participants listened to the sound files during nine sessions that occurred once per week. All of the women were in good general health and had no speech, language, or hearing deficits.

The dichotic listening task was executed using the Alvin software package for speech perception research. The sound file consisted of consonant-vowel syllables comprised of the six plosive consonants /b/, /d/, /g/, /p/, /t/, and /k/ paired with the vowel “ah”. The listeners heard the syllables over stereo headphones. Each listener set the loudness of the syllables to a comfortable level.

At the beginning of the listening session, each participant wrote down the date of the initiation of her most recent menstrual period on a participant sheet identified by her participant number. Then, they heard the recorded syllables and indicated the consonant heard by striking that key on the computer keyboard. Each listening session consisted of three presentations of the syllables. There were different randomizations of the syllables for each presentation. In the first presentation, the stimuli will be presented in a non-forced condition. In this condition the listener indicted the plosive that she heard most clearly. After the first presentation, the experimental files were presented in a manner referred to as a forced left or right condition. In these two conditions the participant was directed to focus on the signal in the left or right ear. The sequence of focus on signal to the left ear or to the right ear was counterbalanced over the sessions.

The statistical analyses of the listeners’ responses revealed that no significant differences occurred between the women using oral contraceptives and those who did not. In addition, correlations between the day of the women’s menstrual cycle and their responses were consistently low. However, some patterns did emerge for the women’s responses across the experimental sessions as opposed to the days of their menstrual cycle. The participants in both groups exhibited a higher REA and lower percentage of errors for the final sessions in comparison to earlier sessions.

The results from the current subjects differ from those previously reported. Possibly the larger sample size of the current study, the additional month of data collection, or the data recording method affected the results. The larger sample size might have better represented how most women respond to dichotic listening tasks. The additional month of data collection may have allowed the women to learn how to respond to the task and then respond in a more consistent manner. The short data collection period may have confused the learning to respond to a novel task with a hormonally dependent response. Finally, previous studies had the experimenter record the subjects’ responses. That method of data recording may have added bias to the data collection. Further studies with large data sets and multiple months of data collection are needed to determine any sex and oral contraceptive use effects on REA.