5aMU5 – Guitar String Sound Retrieved from Moving Pixels

Bożena Kostek – bokostek@audioakustyka.org
Audio Acoustics Laboratory
Faculty of Electronics
Telecommunications and Informatics
Gdansk University of Technology, Narutowicza 11/12
80-233 Gdansk, Poland

Piotr Szczuko – szczuko@sound.eti.pg.gda.pl
Józef Kotus – Joseph@sound.eti.pg.gda.pl
Maciej Szczodrak – szczodry@sound.eti.pg.gda.pl
Andrzej Czyżewski – andcz@sound.eti.pg.gda.pl
Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland

Popular version of paper 5aMU5, “Vibration analysis of acoustic guitar string employing high-speed video cameras”
Presented Friday morning, May 28, 2016, 9:00, Solitude Room
171st ASA Meeting, Salt Lake City

The aim of this study was to develop a method of visual recording and analyzing the vibrations of guitar strings using high-speed cameras and dedicated video processing algorithms. The recording of a plucked string reveals the way in which the deformations propagate, composing the standing and travelling wave. The paper compares the results for a few selected models of classical and acoustic guitars, and it involves processing the vibration image into to the sound recording. The sound reconstructed in this way is compared with the sound recorded synchronously with the reference measurement microphone.

MEASUREMENT SETUP AND METHOD OF VIBRATIONS RECORDING
The measurements were made for three different models and types of guitars (Fig. 1a,b,c). The Martin D-45 is one of the best mass produced acoustic guitars in the world. Its top plate is made from spruce, its sides and back from Indian rosewood, its neck from mahogany, and its fingerboard from ebony. The guitar shape is of the Dreadnought type. In the experiments, acoustic strings were used, metal, thickness of 0.52.

Classical guitar model: MGP 145 classic, c, ar, tailpiece, prototype model. Made in 2014 by Sergei Stańczuk in SEGA Luthier Guitar Studio in Warsaw as a prototype model of classical guitar with two tailpieces. In the experiments, acoustic strings were used, metal, thickness of 0.52.

Defil guitar is a classical instrument made in 1978 by a Polish company, designed for amateur players. In the experiments, the classic nylon strings were used, with a thickness of 0.44.

a) b)Kostek_et_al_Fig.1b - Guitar c)Kostek_et_al_Fig.1c - Guitar

Fig. 1. Guitars under research: a) Martin Dreadnought D-45 acoustic guitar, b) SEGA MGP 145 classic c) classical guitar Defil

Acoustic guitars can be tested applying the acoustic methods and recording of the emitted sound, mathematical modeling and simulation (including the finite element method) or direct vibration measurement using various vibrometric methods (laser, piezoelectric transducer, electromagnetic transducer, an analogue movement meter or digital high-speed cameras and optical displacement and deformation measurement).

Fig. 2 shows the layout of the experimental setup. Video tracking and measurement of vibration are made through the use of two identical and synchronized cameras, acquiring 1,600 frames per second video with a resolution of 2000×128 pixels, and the exposure time of 100ms. A high-class measurement microphone along with an acquisition system were used for the audio recording simultaneously with video shooting. The cameras are placed side by side and oriented towards: a foothold in the bridge, the area above the opening of the sound hole, the neck section up to fret 19 (the first camera) and a section of the neck from fret 19 to fret 6 (second camera).

Kostek_et_al_Fig.2 - Guitar

Fig. 2. Setup for the video and audio recordings of the strings vibrations.

METHOD OF RECONSTRUCTING SOUND FROM IMAGE
In order to accurately measure the deformation of the strings, the video analysis algorithm was created to determine the position of the elementary section of the string visible in each column of the image. Recording, lighting, and exposure conditions were to ensure that the string was the brightest part of the image, and the result of would only be one pixel. The results of the analysis from both cameras, i.e. two vectors describing the position of the string sections were combined into a single series.

It was noticed that the string at rest acted on the bridge with the strength of its tension. Stimulation of the strings was associated with its deformation – increased stress and delivery of energy. After a substantial simplification of the analysis it was possible to perform a simple summation of the deviations of each point on the string and the conversion of the value obtained into a sound sample for each video frame.

ANALYSIS OF SOUND
Analysis of the averaged spectra highlights the differences between the image acquired and microphone recorded sound (Fig. 3.) Spectra were scaled so as that the amplitude of the first harmonic f = 110 Hz was equal for both recordings.

Martin and Luthier guitars (Fig. 3a, 3b) had very thick acoustic strings, which do not deflect much. Defil guitar (Fig. 3c) has soft strings for classical play that easily deform and vibrate with a large amplitude. The colors of generated sounds are different: the ratio between the harmonics is not maintained. This is due to the participation of the soundboard in the generation of sound.

a)Kostek_et_al_Fig.3a b)Kostek_et_al_Fig.3b c)Kostek_et_al_Fig.3c

Fig. 3. Comparison of the average spectra for the signals obtained from the microphone and reconstructed by an optical method: a) Martin Dreadnought D-45 acoustic guitar, b) SEGA MGP 145 classic c) classical guitar DefilCONCLUSION

A method of obtaining the string deformation characteristics from an image and acquiring sound samples from the observed vibrations was presented. Significant differences resulting from not taking into account the impact of soundboard were observed, therefore further work in this area will focus on the systematic study of differences in the spectra and modelling the participation of the guitar soundboard in the creation of sound.

ACKNOWLEDGEMENTS
This research study was supported by the grant, funded by the Polish National Science Centre, decision number DEC-2012/05/B/ST7/02151.

The authors wish to thank Chris Gorski and Sergiusz Stańczuk for providing the guitars.

3pSC10 – Does increasing the playback speed of men’s and women’s voices reduce their intelligibility by the same amount?

Eric M. Johnson – eric.martin.johnson@utah.edu
Sarah Hargus Ferguson – sarah.ferguson@hsc.utah.edu

Department of Communication Sciences and Disorders
University of Utah
390 South 1530 East, Room 1201
Salt Lake City, UT 84112

Popular version of poster 3pSC10, “Gender and rate effects on speech intelligibility.”
Presented Wednesday afternoon, May 25, 2016, 1:00, Salon G
171st ASA Meeting, Salt Lake City

Older adults seeking hearing help often report having an especially hard time understanding women’s voices. However, this anecdotal observation doesn’t always agree with the findings from scientific studies. For example, Ferguson (2012) found that male and female talkers were equally intelligible for older adults with hearing loss. Moreover, several studies have found that young people with normal hearing actually understand women’s voices better than men’s voices (e.g. Bradlow et al., 1996; Ferguson, 2004). In contrast, Larsby et al. (2015) found that, when listening in background noise, groups of listeners with and without hearing loss were better at understanding a man’s voice than a woman’s voice. The Larsby et al. data suggest that female speech might be more affected by distortion like background noise than male speech is, which could explain why women’s voices may be harder to understand for some people.

We were interested to see if another type of distortion, speeding up the speech, would have an equal effect on the intelligibility of men and women. Speech that has been sped up (or time-compressed) has been shown to be less intelligible than unprocessed speech (e.g. Gordon-Salant & Friedman, 2011), but no studies have explored whether time compression causes an equal loss of intelligibility for male and female talkers. If an increase in playback speed causes women’s speech to be less intelligible than men’s, it could reveal another possible reason why so many older adults with hearing loss report difficulty understanding women’s voices. To this end, our study tested whether the intelligibility of time-compressed speech decreases for female talkers more than it does for male talkers.

Using 32 listeners with normal hearing, we measured how much the intelligibility of two men and two women went down when the playback speed of their speech was increased by 50%. These four talkers were selected based on their nearly equivalent conversational speaking rates. We used digital recordings of each talker and made two different versions of each sentence they spoke: a normal-speed version and a fast version. The software we used allowed us to speed up the recordings without making them sound high-pitched.

Audio sample 1: A sentence at its original speed.

Audio sample 2: The same sentence sped up to 50% faster than its original speed.

All of the sentences were presented to the listeners in background noise. We found that the men and women were essentially equally intelligible when listeners heard the sentences at their original speed. Speeding up the sentences made all of the talkers harder to understand, but the effect was much greater for the female talkers than the male talkers. In other words, there was a significant interaction between talker gender and playback speed. The results suggest that time-compression has a greater negative effect on the intelligibility of female speech than it does on male speech.

johnson & ferguson fig 1

Figure 1: Overall percent correct key-word identification performance for male and female takers in unprocessed and time-compressed conditions. Error bars indicate 95% confidence intervals.

These results confirm the negative effects of time-compression on speech intelligibility and imply that audiologists should counsel the communication partners of their patients to avoid speaking excessively fast, especially if the patient complains of difficulty understanding women’s voices. This counsel may be even more important for the communication partners of patients who experience particular difficulty understanding speech in noise.

 

  1. Bradlow, A. R., Torretta, G. M., and Pisoni, D. B. (1996). “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun. 20, 255-272.
  2. Ferguson, S. H. (2004). “Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners,” J. Acoust. Soc. Am. 116, 2365-2373.
  3. Ferguson, S. H. (2012). “Talker differences in clear and conversational speech: Vowel intelligibility for older adults with hearing loss,” J. Speech Lang. Hear. Res. 55, 779-790.
  4. Gordon-Salant, S., and Friedman, S. A. (2011). “Recognition of rapid speech by blind and sighted older adults,” J. Speech Lang. Hear. Res. 54, 622-631.
  5. Larsby, B., Hällgren, M., Nilsson, L., and McAllister, A. (2015). “The influence of female versus male speakers’ voice on speech recognition thresholds in noise: Effects of low-and high-frequency hearing impairment,” Speech Lang. Hear. 18, 83-90.

2aSC7 – Effects of aging on speech breathing

Simone Graetzer, PhD. – sgraetz@msu.edu
Eric J. Hunter, PhD. – ejhunter@msu.edu

Voice Biomechanics and Acoustics Laboratory
Department of Communicative Sciences and Disorders
College of Communication Arts & Sciences
Michigan State University
1026 Red Cedar Road
East Lansing, MI 48824

Popular version of paper 2aSC7, entitled: “A longitudinal study of the effects of aging on speech breathing: Evidence of decreased expiratory volume in speech recordings”
Presented Tuesday morning, May 24, 2016, 8:00 – 11:30 AM, Salon F
171st ASA Meeting, Salt Lake City

Content
The aging population is the fastest growing segment of the population. Some voice, speech and breathing disorders occur more frequently as individuals age. For example, lung capacity diminishes in older age due to loss of lung elasticity, which places an upper limit on utterance duration. Further, decreased lung and diaphragm elasticity and muscle strength can occur, and the rib cage can stiffen, leading to reductions in lung pressure and the volume of air that can be expelled by the lungs (‘expiratory volume’). In the laryngeal system, tissues can break down and cartilages can harden, causing more voice breaks, increased hoarseness or harshness, reduced loudness, and pitch changes.

Our study attempted to identify the normal speech and respiratory changes that accompany aging in healthy individuals. Specifically, we examined how long individuals could speak in a single breath group using a series of speeches from six individuals (three females and three males) over the course of many years (between 18 and 49 years). All speakers had been previously recorded in similar environments giving long, monologue speeches. All but one speaker gave their addresses at a podium using a microphone, and most were longer than 30 minutes each. The speakers’ ages ranged between 43 (51 on average) and 98 (84 on average) years. Samples of five minutes in length were extracted from each recording. Subsequently, for each subject, three raters identified the durations of exhalations during speech in these samples.

Two figures illustrate how the breath groups changed with age for one of the women (Figure 1) and one of the men (Figure 2). We found a change in the speech breathing, which might be caused by a less flexible rib cage and the loss of vital capacity and expiratory volume. In males especially, it may also have been caused by poor closure of the vocal folds, resulting in more air leakage during speech. Specifically, we found a decreased breath group duration for all male subjects after 70 years, with overall durations averaging between 1 and 3.5 seconds. Importantly, the point of change appeared to occur between 60 and 65. For females, this change occurred at a later time, between 60-70 years, with durations averaging between 1.5 and 3.5 seconds.

figure_Page_1 - speech breath

Figure 1 For one of the women talkers, the speech breath groups were measured and plotted to correspond with age. The length of the speech breath groups begins to decrease at about 68 years of age.

figure_Page_2 - speech breath

Figure 2 For one of the men talkers, the speech breath groups were measured and plotted to correspond with age. The length of the speech breath groups begins to decrease at about 66 years of age.

The study results indicate decreases in speech breath group duration for most individuals as their age increased (especially from 65 years onwards), consistent with the age-related decline in expiratory volume reported in other studies. Typically, the speech breath group duration of the six subjects decreased from ages 65 to 70 years onwards. There was some variation between individuals in the point at which the durations started to decrease. The decreases indicate that, as they aged, speakers could not sustain the same number of words in a breath group and needed to inhale more frequently while speaking.

Future studies involving more participants may further our understanding of normal age-related changes vs. pathology, but such a corpus of recordings must first be constrained on the basis of communicative intent, venues, knowledge of vocal coaching, and related information.

References
Hunter, E. J., Tanner, K., & Smith, M. E. (2011), Gender differences affecting vocal health of women in vocally demanding careers. Logopedics Phoniatrics Vocology, 36(3), 128-136.

Janssens, J.P. , Pache, J.C. and Nicod, L.P. (1999), Physiological changes in respiratory function associated with ageing. European Respiratory Journal, 13, 197–205.

Acknowledgements
We acknowledge the efforts of Amy Kemp, Lauren Glowski, Rebecca Wallington, Allison Woodberg, Andrew Lee, Saisha Johnson, and Carly Miller. Research was in part supported by the National Institute On Deafness And Other Communication Disorders of the National Institutes of Health under Award Number R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

1pAA6 – Listening for solutions to a speech intelligibility problem

Anthony Hoover, FASA – thoover@mchinc.com
McKay Conant Hoover, Inc.
Acoustics & Media Systems Consultants
5655 Lindero Canyon Road, Suite 325
Westlake Village, CA 91362

Popular version of paper 1pAA6, “Listening for solutions to a speech intelligibility problem”
Presented Monday afternoon, May 23, 2016, 2:45 in Salon E
171st ASA Meeting in Salt Lake City, UT

Loudspeakers for sound reinforcement systems are designed to project their sound in specific directions. Sound system designers take advantage of the “directivity” characteristics of these loudspeakers, aiming their sound uniformly throughout seating areas, while avoiding walls and ceilings and other surfaces from which undesirable reflections could reduce clarity and fidelity.

Many high-quality sound reinforcement loudspeaker systems incorporate horn loudspeakers that provide very good control, but these are relatively large and conspicuous.   In recent years, “steerable column arrays” have become available, which are tall but narrow, allowing them to better blend into the architectural design.  These are well suited to the frequency range of speech, and to some degree their sound output can be steered up or down using electronic signal processing.

Figure 1 - steerable column arrays - speech intelligibility

Figure 1. steerable column arrays

Figure 1 illustrates the steering technique, with six individual loudspeakers in a vertical array.  Each loudspeaker generates an ever-expanding sphere of sound (in this figure, simplified to show only the horizontal diameter of each sphere), propagating outward at the speed of sound, which is roughly 1 foot per millisecond.  In the “not steered” column, all of the loudspeakers are outputting their sound at the same time, with a combined wavefront spreading horizontally, as an ever-expanding cylinder of sound.  In the “steered downward” column, the electronic signal to each successively lower loudspeaker is slightly delayed; the top loudspeaker outputs its sound first, while each lower loudspeaker in turn outputs its sound just a little later, so that the sound energy is generally steered slightly downward. This steering allows for some flexibility in positioning the loudspeaker column.  However, these systems only offer some vertical control; left-to-right projection is not well controlled.

Steerable column arrays have reasonably resolved speech reinforcement issues in many large, acoustically-problematic spaces. Such arrays were appropriate selections for a large worship space, with a balcony and a huge dome, that had undergone a comprehensive renovation.  Unfortunately, in this case, problems with speech intelligibility persisted, even after multiple adjustments by reputable technicians, who had used their instrumentation to identify several sidewall surfaces that appeared to be reflecting sound and causing problematic echoes. They recommended additional sound absorptive treatment that could adversely affect visual aesthetics and negatively impact the popular classical music concerts.

Upon visiting the space as requested to investigate potential acoustical treatments, speech was difficult to understand in various areas on the main floor.  While playing a click track (imagine a “pop” every 5 seconds) through the sound system, and listening to the results around the main floor, we heard strong echoes emanating from the direction of the surfaces that had been recommended for sound-absorptive treatment.

Nearby those surfaces, additional column loudspeakers had been installed to augment coverage of the balcony seating area.  These balcony loudspeakers were time-delayed (in accordance with common practice, to accommodate the speed of sound) so that they would not produce their sound until the sound from the main loudspeakers had arrived at the balcony. With proper time delay, listeners on the balcony would hear sound from both main and balcony loudspeakers at approximately the same time, and thereby avoid what would otherwise seem like an echo from the main loudspeakers.

With more listening, it became clear that the echo was not due to reflections from the walls at all, but rather from the delayed balcony loudspeakers’ sound inadvertently spraying back to the main seating area.  These loudspeakers cannot be steered in a multifaceted manner that would both cover the balcony and avoid the main floor.

We simply turned off the balcony loudspeakers, and the echo disappeared.  More importantly, speech intelligibility improved significantly throughout the main floor. Intelligibility throughout the balcony remained acceptable, albeit not quite as good as with the balcony loudspeakers operating.

The general plan is to remove the balcony loudspeakers and relocate them to the same wall as the main loudspeakers, but steer them to cover the balcony.

Adding sound-absorptive treatment on the side walls would not have solved the problem, and would have squandered funds while impacting the visual aesthetics and classical music programming.  Listening for solutions proved to be more effective than interpreting test results from sophisticated instrumentation.

5aSCb17 – Pronunciation differences: Gender and ethnicity in Southern English

Wendy Herd – wherd@english.msstate.edu
Devan Torrence – dct74@msstate.edu
Joy Carino – carinoj16@themsms.org

Linguistics Research Laboratory
English Department
Mississippi State University
Mississippi State, MS 39762

Popular version of paper 5aSCb17, “Prevoicing differences in Southern English: Gender and ethnicity effects”
Presented Friday morning, May 27, 10:05 – 12:00 in Salon F
171st ASA Meeting, Salt Lake City

We often notice differences in pronunciation between ourselves and other speakers. More noticeable differences, like the Southern drawl or the New York City pronunciation yuge instead of huge, are even used overtly when we guess where a given speaker is from. Our speech also varies in more subtle ways.

If you hold your hand in front of your mouth when saying tot and dot aloud, you will be able to feel a difference in the onset of vocal fold vibration. Tot begins with a sound that lacks vocal fold vibration, so a large rush of air can be felt on the hand at the beginning of the word. No such rush of air can be felt at the beginning of dot because it begins with a sound with vocal fold vibration. A similar difference can be felt when comparing [p] of pot to [b] of bot and [k] of cot to [ɡ] of got. This difference between [t] and [d] is very noticeable, but the timing of our vocal fold vibration also varies each time we pronounce a different version of [t] or [d].

Our study is particularly focused, not on the large difference between sounds like [t] and [d], but on how speakers produce the smaller differences between different [d] pronunciations. For example, an English [d] might be pronounced with no vocal fold vibration before the [d] as shown in Figure 1(a) or with vocal fold vibration before the [d] as shown in Figure 1(b). As can be heard in the accompanying sound files, the difference between these two [d] pronunciations is less noticeable for English speakers than the difference between [t] and [d].

Pronunciation differences

Figure 1. Spectrogram of (a) dot with no vocal fold vibration before [d] and (b) dot with vocal fold vibration before [d]. (Only the first half of dot is shown.)

We compared the pronunciations of 40 native speakers of English from Mississippi to see if some speakers were more likely to vibrate their vocal folds before [b, d, ɡ] rather than shortly after those sounds. These speakers included equal numbers of African American participants (10 women, 10 men) and Caucasian American participants (10 women, 10 men).

Previous research found that men were more likely to vibrate their vocal folds before [b, d, ɡ] than women, but we found no such gender differences [1]. Men and women from Mississippi employed vocal fold vibration similarly. Instead, we found a clear effect of ethnicity. African American participants produced vocal fold vibration before initial [b, d, ɡ] 87% of the time while Caucasian American participants produced vocal fold vibration before these sounds just 37% of the time. This striking difference, which can be seen in Figure 2, is consistent with a previous smaller study that found ethnicity effects in vocal fold vibration among young adults from Florida [1, 2]. It is also consistent with descriptions of regional variation in vocal fold vibration [3].

Figure 2. Percentage of pronunciations produced with vocal fold vibration before [b, d, ɡ] displayed by ethnicity and gender.

The results suggest that these pronunciation differences are due to dialect variation. African American speakers from Mississippi appear to systematically use vocal fold vibration before [b, d, ɡ] to differentiate them from [p, t, k], but the Caucasian American speakers are using the cue differently and less frequently. Future research in the perception of these sounds could shed light on how speakers of different dialects vary in the way they interpret this cue. For example, if African American speakers are using this cue to differentiate [d] from [t], but Caucasian American speakers are using the same cue to add emphasis or to convey emotion, it is possible that listeners sometimes use these cues to (mis)interpret the speech of others without ever realizing it. We are currently attempting to replicate these results in other regions.

Each accompanying sound file contains two repetitions of the same word. The first repetition does not include fold vibration before the initial sound, and the second repetition does include vocal fold vibration before the initial sound.

  1. Ryalls, J., Zipprer, A., & Baldauff, P. (1997). A preliminary investigation of the effects of gender and race on voice onset time. Journal of Speech Language and Hearing, 40(3), 642-645.
  2. Ryalls, J., Simon, M., & Thomason, J. (2004). Voice onset time production in older Caucasian- and African-Americans. Journal of Multilingual Communication Disorders, 2(1), 61-67.
  3. Jacewicz, E., Fox, R.A., & Lyle, S. (2009). Variation in stop consonant voicing in two regional varieties of American English. Language Variation and Change, 39(3), 313-334.