3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount? – Eric M. Johnson, Sarah Hargus Ferguson

3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount? – Eric M. Johnson, Sarah Hargus Ferguson

Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount?


Eric M. Johnson – eric.martin.johnson@utah.edu

Sarah Hargus Ferguson – sarah.ferguson@hsc.utah.edu

Department of Communication Sciences and Disorders
University of Utah
390 South 1530 East, Room 1201
Salt Lake City, UT 84112


Popular version of poster 3pSC10, “Gender and rate effects on speech intelligibility.”

Presented Wednesday afternoon, May 25, 2016, 1:00, Salon G

171st ASA Meeting, Salt Lake City

Older adults seeking hearing help often report having an especially hard time understanding women’s voices. However, this anecdotal observation doesn’t always agree with the findings from scientific studies. For example, Ferguson (2012) found that male and female talkers were equally intelligible for older adults with hearing loss. Moreover, several studies have found that young people with normal hearing actually understand women’s voices better than men’s voices (e.g. Bradlow et al., 1996; Ferguson, 2004). In contrast, Larsby et al. (2015) found that, when listening in background noise, groups of listeners with and without hearing loss were better at understanding a man’s voice than a woman’s voice. The Larsby et al. data suggest that female speech might be more affected by distortion like background noise than male speech is, which could explain why women’s voices may be harder to understand for some people.

We were interested to see if another type of distortion, speeding up the speech, would have an equal effect on the intelligibility of men and women. Speech that has been sped up (or time-compressed) has been shown to be less intelligible than unprocessed speech (e.g. Gordon-Salant & Friedman, 2011), but no studies have explored whether time compression causes an equal loss of intelligibility for male and female talkers. If an increase in playback speed causes women’s speech to be less intelligible than men’s, it could reveal another possible reason why so many older adults with hearing loss report difficulty understanding women’s voices. To this end, our study tested whether the intelligibility of time-compressed speech decreases for female talkers more than it does for male talkers.

Using 32 listeners with normal hearing, we measured how much the intelligibility of two men and two women went down when the playback speed of their speech was increased by 50%. These four talkers were selected based on their nearly equivalent conversational speaking rates. We used digital recordings of each talker and made two different versions of each sentence they spoke: a normal-speed version and a fast version. The software we used allowed us to speed up the recordings without making them sound high-pitched.

Audio sample 1: A sentence at its original speed.

Audio sample 2: The same sentence sped up to 50% faster than its original speed.


All of the sentences were presented to the listeners in background noise. We found that the men and women were essentially equally intelligible when listeners heard the sentences at their original speed. Speeding up the sentences made all of the talkers harder to understand, but the effect was much greater for the female talkers than the male talkers. In other words, there was a significant interaction between talker gender and playback speed. The results suggest that time-compression has a greater negative effect on the intelligibility of female speech than it does on male speech.

johnson & ferguson fig 1

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.


These results confirm the negative effects of time-compression on speech intelligibility and imply that audiologists should counsel the communication partners of their patients to avoid speaking excessively fast, especially if the patient complains of difficulty understanding women’s voices. This counsel may be even more important for the communication partners of patients who experience particular difficulty understanding speech in noise.


  1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20, 255-272.
  2. Ferguson, S. H. (2004). “Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365-2373.
  3. Ferguson, S. H. (2012). “Talker differences in clear and conversational speech: Vowel intelligibility for older adults with hearing loss,” J. Speech Lang. Hear. Res. 55, 779-790.
  4. Gordon-Salant, S., and Friedman, S. A. (2011). “Recognition of rapid speech by blind and sighted older adults,” J. Speech Lang. Hear. Res. 54, 622-631.
  5. Larsby, B., Hällgren, M., Nilsson, L., and McAllister, A. (2015). “The influence of female versus male speakers’ voice on speech recognition thresholds in noise: Effects of low-and high-frequency hearing impairment,” Speech Lang. Hear. 18, 83-90.



2aSC7 –  Effects of aging on speech breathing -Simone Graetzer, PhD.,  Eric J. Hunter, PhD.

2aSC7 – Effects of aging on speech breathing -Simone Graetzer, PhD., Eric J. Hunter, PhD.

Simone Graetzer, PhD. – sgraetz@msu.edu

Eric J. Hunter, PhD. – ejhunter@msu.edu


Voice Biomechanics and Acoustics Laboratory
Department of Communicative Sciences and Disorders
College of Communication Arts & Sciences
Michigan State University
1026 Red Cedar Road
East Lansing, MI 48824


Popular version of paper 2aSC7, entitled: “A longitudinal study of the effects of aging on speech breathing: Evidence of decreased expiratory volume in speech recordings”

Presented Tuesday morning, May 24, 2016, 8:00 – 11:30 AM, Salon F

171st ASA Meeting, Salt Lake City



The aging population is the fastest growing segment of the population. Some voice, speech and breathing disorders occur more frequently as individuals age. For example, lung capacity diminishes in older age due to loss of lung elasticity, which places an upper limit on utterance duration. Further, decreased lung and diaphragm elasticity and muscle strength can occur, and the rib cage can stiffen, leading to reductions in lung pressure and the volume of air that can be expelled by the lungs (‘expiratory volume’). In the laryngeal system, tissues can break down and cartilages can harden, causing more voice breaks, increased hoarseness or harshness, reduced loudness, and pitch changes.

Our study attempted to identify the normal speech and respiratory changes that accompany aging in healthy individuals. Specifically, we examined how long individuals could speak in a single breath group using a series of speeches from six individuals (three females and three males) over the course of many years (between 18 and 49 years). All speakers had been previously recorded in similar environments giving long, monologue speeches. All but one speaker gave their addresses at a podium using a microphone, and most were longer than 30 minutes each. The speakers’ ages ranged between 43 (51 on average) and 98 (84 on average) years. Samples of five minutes in length were extracted from each recording. Subsequently, for each subject, three raters identified the durations of exhalations during speech in these samples.

Two figures illustrate how the breath groups changed with age for one of the women (Figure 1) and one of the men (Figure 2). We found a change in the speech breathing, which might be caused by a less flexible rib cage and the loss of vital capacity and expiratory volume. In males especially, it may also have been caused by poor closure of the vocal folds, resulting in more air leakage during speech. Specifically, we found a decreased breath group duration for all male subjects after 70 years, with overall durations averaging between 1 and 3.5 seconds. Importantly, the point of change appeared to occur between 60 and 65. For females, this change occurred at a later time, between 60-70 years, with durations averaging between 1.5 and 3.5 seconds.



Figure 1 For one of the women talkers, the speech breath groups were measured and plotted to correspond with age. The length of the speech breath groups begins to decrease at about 68 years of age.

Graetzer and Hunter – Aging1

Figure 1 For one of the women talkers, the speech breath groups were measured and plotted to correspond with age. The length of the speech breath groups begins to decrease at about 68 years of age.



Figure 2 For one of the men talkers, the speech breath groups were measured and plotted to correspond with age. The length of the speech breath groups begins to decrease at about 66 years of age.

Graetzer and Hunter – Aging1


The study results indicate decreases in speech breath group duration for most individuals as their age increased (especially from 65 years onwards), consistent with the age-related decline in expiratory volume reported in other studies. Typically, the speech breath group duration of the six subjects decreased from ages 65 to 70 years onwards. There was some variation between individuals in the point at which the durations started to decrease. The decreases indicate that, as they aged, speakers could not sustain the same number of words in a breath group and needed to inhale more frequently while speaking.

Future studies involving more participants may further our understanding of normal age-related changes vs. pathology, but such a corpus of recordings must first be constrained on the basis of communicative intent, venues, knowledge of vocal coaching, and related information.



Hunter, E. J., Tanner, K., & Smith, M. E. (2011), Gender differences affecting vocal health of women in vocally demanding careers. Logopedics Phoniatrics Vocology, 36(3), 128-136.


Janssens, J.P. , Pache, J.C. and Nicod, L.P. (1999), Physiological changes in respiratory function associated with ageing. European Respiratory Journal, 13, 197–205.


We acknowledge the efforts of Amy Kemp, Lauren Glowski, Rebecca Wallington, Allison Woodberg, Andrew Lee, Saisha Johnson, and Carly Miller. Research was in part supported by the National Institute On Deafness And Other Communication Disorders of the National Institutes of Health under Award Number R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.



2aBAa5 – Sound Waves Helps Assess Bone Condition – Max Denis

2aBAa5 – Sound Waves Helps Assess Bone Condition – Max Denis

Sound Waves Helps Assess Bone Condition


Max Denis – denis.max@mayo.edu      507-266-7449

Leighton Wan – wan.leighton@mayo.edu

Matthew Cheong – cheong.matthew@mayo.edu

Mostafa Fatemi – fatemi.mostafa@mayo.edu

Azra Alizad – alizad.azra@mayo.edu    507-254-5970


Mayo Clinic College of Medicine
200 1st St SW
Rochester, MN 55905


Popular version of paper 2aBAa5, “Bone demineralization assessment using acoustic radiation force”

Presented Tuesday morning, May 24, 2016, 9:00 AM in Snowbird/Brighton room

171st ASA Meeting, Salt Lake City, Utah


The assessment of the human skeletal health condition is of great importance ranging from newborn infants to the elderly. Annually, approximately fifty percent of the 550,000 premature newborn infants in the United States suffer from bone metabolism related disorders such as osteopenia, which affect the bone development process into childhood. As we age through adulthood, reductions in our bone mass increases due an unbalance activity in the bone reformation process leading to bone diseases such as osteoporosis; putting a person at risk for fractures in the neck, hip and forearm areas.

Currently bone assessment tools include dual-energy X-ray absorptiometry (DEXA), and quantitative ultrasound (QUS). DEXA is the leading clinical bone quality assessment tool, detecting small changes in bone mineral content and density. However, DEXA uses ionizing radiation for imaging thus exposing patients to very low radiation doses. This can be problematic for frequent clinical visits to monitor the efficacy of prescribed medications and therapies.

QUS has been sought as a nonionizing and noninvasive alternative to DEXA. QUS utilizes measurements of ultrasonic waves between a transmitting and a receiving transducer aligned in parallel along bone surface. Speed of sound (SOS) measurements of the received ultrasonic signal is used to characterize the bone material properties. The determination of the SOS parameter is susceptible to the amount of soft tissue between the skin surface and the bone. Thus, we propose utilizing a high intensity ultrasonic wave known as a “push beam” to exert a force on the bone surface thereby generating vibrations. This will minimize the effects of the soft tissue. The radiate sound wave due to these vibrations are captured and used to analyze the bone mechanical properties.

This work demonstrates the feasibility of evaluating bone mechanical properties from sound waves due to bone vibrations. Under an approved protocol by the Mayo Clinic Institutional Review Board (IRB), human volunteers were recruited to undergo our noninvasive bone assessment technique. Our cohort consisted of clinically confirmed osteopenia and osteoporosis patients, as well as normal volunteers without a history of bone fractures. An ultrasound probe and hydrophone were placed along the volunteers’ tibia bone (Figure 1a). A B-mode ultrasound was used to guide the placement of our push beam focal point onto the bone surface underneath the skin layer (Figure 1b). The SOS was obtained from the measurements.


Figure 1. (a) Probe and hydrophone alignment along the tibia bone. (b) Diagram of an image-guided push beam focal point excitation on the bone surface.

In total 14 volunteers were recruited in our ongoing study. A boxplot comparison of SOS between normal and bone diseased (osteopenia and osteoporotic) volunteers in Figure 2, shows that typically sound travels faster in healthy bones than osteoporotic and osteopenia bones with SOS median values (red line) of 3733 m/s and 2566 m/s, respectively. Hence, our technique may be useful as a noninvasive method for monitoring the skeletal health status of the premature and aging population.


Figure 2. Normal and bone diseased volunteers sound of speed comparisons.


This ongoing project is being done under an approved protocol by Mayo Institutional Review Board.


2pAAa10 – Turn around when you’re talking to me! – Jennifer Whiting, Timothy Leishman, PhD, K.J. Bodon

Turn around when you’re talking to me!

Jennifer Whiting – jkwhiting@physics.byu.edu

Timothy Leishman, PhD – tim_leishman@physics.byu.edu

K.J. Bodon – joshuabodon@gmail.com

Brigham Young University

N283 Eyring Science Center

Provo, UT 84602


Popular version of paper 2pAAa10, “High-resolution measurements of speech directivity”

Presented Tuesday afternoon, November 3, 2015, 4:40 PM, Grand Ballroom 3

170th ASA Meeting, Jacksonville


In general, most sources of sound do not radiate equally in all directions. The human voice is no exception to this rule. How strongly sound is radiated in a given direction at a specific frequency, or pitch, is called directivity. While many [references] have studied the directivity of speaking and singing voices, some important details are missing. The research reported in this presentation measured directivity of live speech at higher angular and frequency resolutions than have been previously measured, in an effort to capture the missing details.

Measurement methods

The approach uses a semicircular array of 37 microphones spaced with five-degree polar-angle increments, see Figure 1. A subject sits on a computer-controlled rotating chair with his or her mouth aligned at the axis of rotation and circular center of the microphone array. He or she repeats a series of phonetically-balanced sentences at each of 72 five-degree azimuthal-angle increments. This results in 2522 measurement points on a sphere around the subject.


[Figure 1. A subject and the measurement array]


The measurements are based on audio recordings of the subject who tries to repeat the sentences with exactly the same timing and inflection at each rotation. To account for the inevitable differences in each repetition, a transfer function and the coherence between a reference microphone near the subject and a measurement microphone on the semicircular array is computed. The coherence is used to examine how good each measurement is. The transfer function for each measurement point makes up the directivity. To visualize the results, each measurement is plotted on a sphere, where the color and the radius of the sphere indicate how strongly sound is radiated in that direction for a given frequency. Animations of these spherical plots show how the directivity differs for each frequency.

[Figure 2. Balloon plot for male speech directivity at 500 and 1000 Hz.]

[Figure 3. Balloon plot for female speech directivity at 500 and 1000 Hz.]


[Animation 1. Male Speech Directivity, animated]

[Animation 2. Female Speech Directivity, animated]

Results and Conclusions

Some unique results are visible in the animations. Most importantly, as frequency increases, one can see that most of the sound is radiated in the forward direction. This is one reason for why it’s hard to hear someone talking in the front of a car when you’re sitting in the back, unless they turn around to talk to you. One can also see in the animations that as frequency increases, and most of the sound radiates forwards, there is poor coherence in the back area. This doesn’t necessarily indicate a poor measurement, just poor signal-to-noise ratio, since there is little sound energy in that direction. It’s also interesting to see that the polar angle of the strongest radiation also changes with frequency. At some frequencies the sound is radiated strongly downward and to the sides, but at other frequencies the stound is radiated strongly upwards and forwards. Male and female directivities are similar in shape, but at different frequencies, since the fundamental frequency of males and females is so different.

A more complete understanding of speech directivity has great benefits to several industries. For example, hearing aid companies can use speech directivity patterns to know where to aim microphones in the hearing aids to pick up the best sound for the hearing aid wearer having a conversation. Microphone placement in cell phones can be adjusted to get clearer signal from those talking into the cell phone. The theater and audio industries can use directivity patterns to assist in positioning actors on stage, or placing microphones near the speakers to record the most spectrally rich speech. The scientific community can develop more complete models for human speech based on these measurements. Further study on this subject will allow researchers to improve the measurement method and analysis techniques to more fully understand the results, and generalize them to all speech containing similar phonemes to those in these measurements.

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages – Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages – Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian

Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
SUNY at Binghamton
Binghamton, NY


Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).




(b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples


Sadeghian 2_graphs

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.