2aMU5 – Do people find vocal fry in popular music expressive?

Mackenzie Parrott – mackenzie.lanae@gmail.com
John Nix – john.nix@utsa.edu

Popular version of paper 2aMU5, “Listener Ratings of Singer Expressivity in Musical Performance.”
Presented Tuesday, May 24, 2016, 10:20-10:35 am, Salon B/C, ASA meeting, Salt Lake City

Vocal fry is the lowest register of the human voice.  Its distinct sound is characterized by a low rumble interspersed with uneven popping and crackling.  The use of fry as a vocal mannerism is becoming increasingly common in American speech, fueling discussion about the implications of its use and how listeners perceive the speaker [1].  Previous studies have suggested that listeners find vocal fry to be generally unpleasant in women’s speech, but associate it with positive characteristics in men’s speech [2].

As it has become more prevalent, fry has perhaps not surprisingly found its place in many commercial song styles as well.  Many singers are implementing fry as a stylistic device at the onset or offset of a sung tone.  This can be found very readily in popular musical styles, presumably to impact and amplify the emotion that the performer is attempting to convey.

Researchers at the University of Texas at San Antonio conducted a survey to analyze whether listeners’ ratings of a singer’s expressivity in musical samples in two contemporary commercial styles (pop and country) were affected by the presence of vocal fry, and to see if there was a difference in listener ratings according to the singer’s gender.  A male and a female singer recorded musical samples for the study in a noise reduction booth.  As can be seen in the table below, the singers were asked to sing most of the musical selections twice, once using vocal fry at phrase onsets, and once without fry, while maintaining the same vocal quality, tempo, dynamics, and stylization.  Some samples were presented more than one time in the survey portion of the study to test listener reliability.

Song Singer Gender Vocal Mode
(Hit Me) Baby One More Time Female Fry Only
If I Die Young Female With and Without Fry
National Anthem Female With and Without Fry
Thinking Out Loud Male Without Fry Only
Amarillo By Morning Male With and Without Fry
National Anthem Male With and Without Fry

Across all listener ratings of all the songs, the recordings which included vocal fry were rated as being only slightly more expressive than the recordings which contained no vocal fry.  When comparing the use of fry between the male and female singer, there were some differences between the genders.  The listeners rated the samples where the female singer used vocal fry higher (e.g., more expressive) than those without fry, which was surprising considering the negative association with women using vocal fry in speech.  Conversely, the listeners rated the male samples without fry as being more expressive than those with fry. Part of this preference pattern may have also been an indication of the singer; the male singer was much more experienced with pop styles than the female singer, who is primarily classically trained.  The overall expressivity ratings for the male singer were higher than those of the female singer by a statistically significant margin.

There were also listener rating trends between the differing age groups of participants.  Younger listeners drove the gap of preference between the female singer’s performances with fry versus non-fry and the male singer’s performances without fry versus with fry further apart.  Presumably they are more tuned into stylistic norms of current pop singers.  However, this could also imply a gender bias in younger listeners.  The older listener groups rated the mean expressivity of the performers as being lower than the younger listener groups.  Since most of the songs that we sampled are fairly recent in production, this may indicate a generational trend in preference.  Perhaps listeners rate the style of vocal production that is most similar to what they listened to during their young adult years as the most expressive style of singing. These findings have raised many questions for further studies about vocal fry in pop and country music.


  1. Anderson, R.C., Klofstad, C.A., Mayew, W.J., Venkatachalam, M. “Vocal Fry May Undermine the Success of Young Women in the Labor Market. “ PLoS ONE, 2014. 9(5): e97506. doi:10.1371/journal.pone.0097506.
  2. Yuasa, I. P. “Creaky Voice: A New Feminine Voice Quality for Young Urban-Oriented Upwardly Mobile American Women.” American Speech, 2010. 85(3): 315-337.

2aPA8 – Taming Tornadoes: Controlled Trapping and Rotation with Acoustic Vortices

Asier Marzo – amarzo@hotmail.com
Mihai Caleap
Bruce Drinkwater

Bristol University
Senate House, Tyndall Ave,
Bristol, United Kingdom

Popular version of paper 2aPA8, “Taming tornadoes: Controlling orbits inside acoustic vortex traps”
Presented Tuesday afternoon, May 24, 2016, 11:05 AM, Salon H
171st ASA Meeting Salt Lake City

Tractor beams are mysterious beams that have the ability to attract objects towards the source of the emission (Figure 1). These beams have attracted the attention of both scientists and sci-fi fans. For instance, it is quite an iconic device in Star Wars or Star Trek where it is used by big spaceships to trap and capture smaller objects.


Figure 1. A sonic tractor beam working on air.

In the scientific community, they have been studied theoretically for decades and in 2014, a tractor beam made with light was realized [1]. It used the energy of the photons bouncing on a microsphere to keep it trapped laterally and at the same time heated the back of the sphere with different light patterns to pull it towards the laser source. The sphere had a diameter of 50 micrometres, was made of glass and coated with gold.

A tractor beam made with light can only manipulate very small particles and made of specific materials. Making a tractor beam which uses mechanical waves (i.e. sound or ultrasound) would enable the trapping of a much wider range of particle sizes and allow almost any combination of particle and host fluid materials, for example drug delivery agents within the human body.

Recently, it has been proven experimentally that a Vortex beam can act as a tractor beam both in air [2] and in water [3]. A Vortex beam (such as a first order Bessel beam) is analogous to a tornado of sound which is hollow in the middle and spirals about a central axis, the particles get trapped in the calm eye of the tornado (Figure 2).

Figure-02 - Acoustic Vortices

Figure 2. Intensity iso-surface of an Acoustic Vortex. 54 ultrasonic speakers emitting at 40kHz arranged in a hemisphere (see [2] for fuller details) create an acoustic vortex that traps the particle in the middle.

The problem is, that only very small particles are stably trapped inside the vortex. As the particles get bigger, they start to spin and orbit until being ejected (Figure 3). As in a tornado, only the small particles remain within the vortex whereas the larger ones get ejected.


Figure 3. Particle behaviour depending on its size: a small particle is trapped (a), a middle particle orbits (b) and big particles gets ejected (c).

Here we show that, contrary to a tornado, we can change the direction of an acoustic vortex thousands of times per second. In our paper, we prove that by rapidly switching the direction of the acoustic vortex it is possible to produce stable trapping of particles of various sizes. Furthermore, by adjusting the proportion of time that each vortex direction is emitted, the spinning speed of the particle can be controlled (Figure 4).

Figure-04 - Acoustic Vortices

Figure 4. Taming the vortex: a) the vortex rotates all the time in the same direction and this rotation is transferred to the particle. b) the vortex switches direction and thus the angular momentum is completely or partially cancelled, providing rotational control.

The ability to levitate and controllably rotate inside acoustic vortices particles such as liquids, crystals or even living cells enables new possibilities and processes for a variety of disciplines.


  1. Shvedov, V., Davoyan, A. R., Hnatovsky, C., Engheta, N., & Krolikowski, W. (2014). A long-range polarization-controlled optical tractor beam. Nature Photonics, 8(11), 846-850.
  2. Marzo, A., Seah, S. A., Drinkwater, B. W., Sahoo, D. R., Long, B., & Subramanian, S. (2015). Holographic acoustic elements for manipulation of levitated objects. Nature communications, 6.
  3. Baresch, D., Thomas, J. L., & Marchiano, R. (2016). Observation of a single-beam gradient force acoustical trap for elastic particles: acoustical tweezers. Physical Review Letters, 116(2), 024301.

1aPP44 – What’s That Noise? The Effect of Hearing Loss and Tinnitus on Soldiers Using Military Headsets

Candice Manning, AuD, PhD – Candice.Manning@va.gov
Timothy Mermagen, BS – timothy.j.mermagen.civ@mail.mil
Angelique Scharine, PhD – angelique.s.scharine.civ@mail.mil

Human and Intelligent Agent Integration Branch (HIAI)
Human Research and Engineering Directorate
U.S. Army Research Laboratory
Building 520
Aberdeen Proving Ground, MD

Lay language paper 1aPP44, “Speech recognition performance of listeners with normal hearing, sensorineural hearing loss, and sensorineural hearing loss and bothersome tinnitus when using air and bone conduction communication headsets”
Presented Monday Morning, May 23, 2016, 8:00 – 12:00, Salon E/F
171st ASA Meeting, Salt Lake City

Military personnel are at high risk for noise-induced hearing loss due to the unprecedented proportion of blast-related acoustic trauma experienced during deployment from high-level impulsive and continuous noise (i.e., transportation vehicles, weaponry, blast-exposure).  In fact, noise-induced hearing loss is the primary injury of United States Soldiers returning from Afghanistan and Iraq.  Ear injuries, including tympanic membrane perforation, hearing loss, and tinnitus, greatly affect a Soldier’s hearing acuity and, as a result, reduce situational awareness and readiness.  Hearing protection devices are accessible to military personnel; however, it has been noted that many troops forego the use of protection believing it may decrease circumstantial responsiveness during combat.

Noise-induced hearing loss is highly associated with tinnitus, the experience of perceiving sound that is not produced by a source outside of the body.  Chronic tinnitus causes functional impairment that may result in a tinnitus sufferer to seek help from an audiologist or other healthcare professional.  Intervention and management are the only options for those individuals suffering from chronic tinnitus as there is no cure for this condition.  Tinnitus affects every aspect of an individual’s life including sleep, daily tasks, relaxation, and conversation to name only a few.  In 2011, the United States Government Accountability Office report on noise indicated that tinnitus was the most prevalent service-connected disability.  The combination of noise-induced hearing loss and the perception of tinnitus could greatly impact a Soldier’s ability to rapidly and accurately process speech information under high-stress situations.

The prevalence of hearing loss and tinnitus within the military population suggests that Soldier use of hearing protection is extremely important. The addition of hearing protection into reliable communication devices will increase the probability of use among Soldiers.  Military communication devices using air and bone-conduction provide clear two-way audio communications through a headset and a microphone.

Air conduction headsets offer passive hearing protection from high ambient noise, and talk-through microphones allow the user to engage in face-to-face conversation and hear ambient environmental sounds, preserving situation awareness.  Bone-conduction technology utilizes the bone-conduction pathway and presents auditory information differently than air-conduction devices (see Figure 1).  Because headsets with bone conduction transducers do not cover the ears, they allow the user to hear the surrounding environment and the option to communicate over a radio network.  Worn with or without hearing protection, bone conduction devices are inconspicuous and fit easily under the helmet.   Bone conduction communication devices have been used in the past; however, as newer devices have been designed, they have not been widely adopted for military applications.

Manning1a - headsetsA. Manning1b - headsetsB.

Figure 1. Air and Bone conduction headsets used during study: a) Invisio X5 dual in-ear headset and X50 control unit and b) Aftershockz Sports 2 headset.

Since many military personnel operate in high noise environments and with some degree of noise induced hearing damage and/or tinnitus, it is important to understand how speech recognition performance might be altered as a function of headset use.  This is an important subject to evaluate as there are two auditory pathways (i.e., air-conduction pathway and bone-conduction pathway) that are responsible for hearing perception.  Comparing the differences between the air and bone-conduction devices on different hearing populations will help to describe the overall effects of not only hearing loss, an extremely common disability within the military population, but the effect of tinnitus on situational awareness as well.  Additionally, if there are differences between the two types of headsets, this information will help to guide future communication device selection for each type of population (NH vs. SNHL vs. SNHL/Tinnitus).

Based on findings from speech understanding in noise literature, communication devices do have a negative effect on speech intelligibility within the military population when noise is present.  However, it is uncertain as to how hearing loss and/or tinnitus effects speech intelligibility and situational awareness under high-level noise environments.  This study looked at speech recognition of words presented over AC and BC headsets and measured three groups of listeners: Normal Hearing, sensorineural hearing impaired, and/or tinnitus sufferers. Three levels of speech-to-noise (SNR=0,-6,-12) were created by embedding speech items in pink noise.  Overall, performance was marginally, but significantly better for the Aftershockz bone conduction headset (Figure 2).  As would be expected, performance increases as the speech to noise ratio increases (Figure 3).


Figure 2. Mean rationalized arcsine units measured for each of the TCAPS under test.


Figure 3. Mean rationalized arcsine units measured as a function of speech to noise ratio.

One of the most fascinating things about the data is that although the effect of hearing profile was significant, it was not practically so, the means for the Normal Hearing, Hearing Loss and Tinnitus groups were 65, 61, and 63, respectively (Figure 4).  Nor was there any interaction with any of the other variables under test.  One might conclude from the data that if the listener can control the level of presentation, the speech to noise ratio has about the same effect, regardless of hearing loss. There was no difference in performance with the TCAPS due to one’s hearing profile; however, the Aftershockz headset provided better speech intelligibility for all listeners.


Figure 4. Mean rationalized arcsine units observed as a function of the hearing profile of the listener.

2aAAa7 – Gunshot recordings from a criminal incident: who shot first?

Robert C. Maher – rob.maher@montana.edu
Electrical & Computer Engineering Department
Montana State University
P.O. Box 173780
Bozeman, MT 59717-3780

Popular version of paper 2aAAa7, “Gunshot recordings from a criminal incident: Who shot first?”
Presented Tuesday morning, May 24, 2016, 10:20 AM, Salon E
171st ASA Meeting, Salt Lake City

In the United States, criminal actions involving firearms are of ongoing concern to law enforcement and the public.  The FBI’s 2013 National Incident-Based Reporting System (NIBRS) report lists 50,721 assault incidents and 30,915 robbery incidents involving firearms that year [1].

As more and more law enforcement officers wear vest cameras and more and more citizens carry smartphones, the number of investigations involving audio forensic evidence continues to grow—and in some cases the audio recordings may include the sound of gunshots.

Is it possible to analyze a forensic audio recording containing gunshot sounds to discern useful forensic evidence?  In many cases the answer is yes.

Audio forensics, or forensic acoustics, involves evaluation of audio evidence for either a court of law or for some other official investigation [2].  Experts in audio forensics typically have special knowledge, training, and experience in the fields of acoustics, electrical engineering, and audio signal processing.

One common request in audio forensic investigations involving gunshots is “who fired first?”  There may be a dispute about the circumstances of a firearms incident, such as one party claiming that shots were fired in self-defense after the other party fired first, while the other party has the opposite claim.  Sometimes a dispute can arise if a witness reports that a law enforcement officer shot an armed but fleeing suspect without justification, while the officer claims that the suspect had fired.


Figure 1: Muzzle blast recording of a 9mm handgun obtained under controlled conditions [4].

The sound of a gunshot is often depicted in movies and computer games as a very dramatic “BOOM” sound that lasts for as long as a second before diminishing away.  But the actual muzzle blast of a common handgun is really only about 1 millisecond (one 1/1000th of a second) in duration (see Figure 1).  More than 20-30 meters away, most of the audible sound of a gunshot is actually sound waves reflected by nearby surfaces [3].

Let’s consider a simplified case example from an investigation in an unnamed jurisdiction.  In this case, a shooting incident on a city street involving two perpetrators was recorded by a residential surveillance system located down the street.  The camera’s field-of-view did not show the incident, but the microphone picked up the sounds of gunfire.  Based on witness reports and the identification of shell casings and other physical evidence at the scene, the police investigators determined that the two perpetrators were several meters apart and fired their handguns toward each other, one pointing north and the other pointing south.  Figuring out which gun was fired first could not be determined from the physical evidence at the scene nor from witness testimony, so attorneys for the suspects requested analysis of the audio recording to determine whether or not it could help answer the “who shot first?” question.

The waveform and the corresponding spectrogram from the portion of the recording containing the first two gunshot sounds are shown in Figure 2.  The spectrogram is a special kind of graph that depicts time on the horizontal axis and frequency on the vertical axis, with the brightness of the graph indicating the amount of sound energy present at a particular time in a particular frequency range.  The sound energy envelope for this same signal is shown in Figure 3.  The microphone picked up the direct sound of the gunshots, but also the reflected sound from the street, nearby buildings, and other obstacles, causing the relatively long duration of the two shots in the recording.

In this case, we note that the first gunshot has a distinctive echo (indicated by the arrow), while the second gunshot does not show this feature.  What might account for this peculiar difference?


Figure 2:  Sound waveform and spectrogram of two gunshots recorded by a residential surveillance system.  The arrow indicates the distinctive echo.


Figure 3:  Sound energy envelope for the two gunshots depicted in Figure 2.  The arrow indicates the echo.

Examining the neighborhood street where the shooting incident took place (Figure 4) revealed that there was a “T” intersection about 90 meters north of the shooting scene with a large building facing the street.  The length of the reflected sound path from the shooting site to the large building and back is therefore a little over 180 meters, which corresponds to the 0.54 seconds of time delay between the direct sound of the gunshot an the echo—which would account for the timing of the distinct reflection.  The microphone was located 30 meters south of the shooting scene.  But why would the observed reflection differ for the two firearms if they were located quite close together at the time of the gunfire?


Figure 4:  Sketch of the shooting scene (plan view)

Our conclusion was that the firearm pointing north toward the “T” intersection would likely produce a stronger reflection because the muzzle blast of a handgun is louder in the direction the gun is pointing [5]. Thus, the gun pointing toward the reflecting surface would produce a stronger reflected sound than the gun pointing away from the reflecting surface.

While the availability of additional acoustic evidence of firearm incidents can only be a positive development for the U.S. justice system, interpreting audio recordings of gunshots remains a challenge for audio forensic examiners for several reasons. First, the acoustical characteristics of gunshots must be studied carefully because the recording is likely to include sound reflections, diffraction, reverberation, background sounds, and other content that can interfere with the analysis.  Second, common audio recorders are intended for speech signals, and therefore they are not designed to capture the very brief and very intense sounds of gunfire.  Finally, the acoustical similarities and differences among different types of firearms are still the subject of research, so the notion of having a simple database of gunshot sounds to compare with an evidentiary recording is not yet feasible.


[1]  U.S. Department of Justice, 2013 National Incident-Based Reporting System (NIBRS) Data Tables (2013). Available at https://www.fbi.gov/about-us/cjis/ucr/nibrs/2013/data-tables . Accessed May 6, 2016.

[2]  Maher, R.C., Lending an ear in the courtroom: forensic acoustics, Acoustics Today, vol. 11, no. 3, pp. 22-29, 2015.

[3]  Maher, R.C., Acoustical characterization of gunshots, Proceedings of the IEEE SAFE Workshop on Signal Processing Applications for Public Security and Forensics, Washington, DC, pp. 109-113 (2007).

[4]  Maher, R.C. and Shaw, S.R., Gunshot recordings from digital voice recorders, Proceedings of the Audio Engineering Society 54th Conference, Audio Forensics—Techniques, Technologies, and Practice, London, UK (2014).

[5]  Maher, R.C. and Shaw, S.R., Directional aspects of forensic gunshot recordings, Proceedings of the Audio Engineering Society 39th Conference, Audio Forensics—Practices and Challenges, Hillerød, Denmark (2010).

2aBAa7 – Ultrasonic “Soft Touch” for Breast Cancer Diagnosis

Mahdi Bayat – bayat.mahdi@mayo.edu
Alireza Nabavizadeh- nabavizadehrafsanjani.alireza@mayo.edu
Viksit Kumar- kumar.viksit@mayo.edu
Adriana Gregory- gregory.adriana@mayo.edu
Azra Aliza- alizad.azra@mayo.edu
Mostafa Fatemi- Fatemi.mostafa@mayo.edu
Mayo Clinic College of Medicine
200 First St SW
Rochester, MN 55905

Michael Insana- mfi@illinois.edu
University of Illinois at Urbana-Champaign
Department of Bioengineering
1270 DCL, MC-278
1304 Springfield Avenue
Urbana, IL 61801

Popular version of paper 2aBAa7, “Differentiation of breast lesions based on viscoelasticity response at sub-Hertz frequencies”
Presented Tuesday Morning, May 24, 2016, 9:30 AM, Snowbird/Brighton room
171st ASA Meeting, Salt Lake City

Breast cancer remains the first cause of death among American women under the age of 60. Although modern imaging technologies, such as enhanced mammography (tomosynthesis), MRI and ultrasound, can visualize a suspicious mass in breast, it often remains unclear whether the detected mass is cancerous or non-cancerous until a biopsy is performed.

Despite high sensitivity for detecting lesions, no imaging modality alone has yet been able to determine the type of all abnormalities with high confidence. For this reason most patients with suspicious masses, even those with very small likelihood of a cancer, opt in to undergo a costly and painful biopsy.

It is long believed that cancerous tumors grow in the form of stiff masses that, if found to be superficial enough, can be identified by palpation. The feeling of hardness under palpation is directly related to the tissue’s tendency to deform upon compression.  Elastography, which has emerged as a branch of ultrasound, aims at capturing tissue stiffness by relating the amount of tissue deformation under a compression to its stiffness. While this technique has shown promising results in identifying some types of breast lesions, the diversity of breast cancer types leaves doubt whether stiffness alone is the best discriminator for diagnostic purposes.

Studies have shown that tissues subjected to a sudden external force do not deform instantly, rather they deform gradually over a period of time. Tissue deformation rate reveals another important aspect of its mechanical property known as viscoelasticity. This is the main material feature that, for example, makes a piece of memory foam to feel differently from a block of rubber under the touch. Similar material feature can be used to explore mechanical properties of different types of tissue. In breast masses, studies have shown that biological pathways leading to different breast masses are quite different. While in benign lesions an increase in a protein-based component can potentially increase its viscosity, hence a slower deformation rate compared to normal tissue, the opposite trend occurs in malignant tumors.

In this study, we report on using an ultrasound technique that enables capturing the deformation rate in breast tissue. We studied 43 breast masses in 42 patients and observed that a factor based on the deformation rate was significantly different in benign and malignant lesions (Fig. 1).

The results of this study promise a new imaging biomarker for diagnosis of the breast masses. If such technique proves to be of high accuracy in a large pool of patients, then this technology can be integrated into breast examination procedures to improve the accuracy of diagnosis, reduce unnecessary biopsies, and help detecting cancerous tumors early on

Figure 1 breast cancer

Figure1- Distribution of relative deformation rates for malignant and benign breast lesions. A significantly different relative deformation rates can be observed in the two groups, thus allowing differentiation of such lesions.