4aAB4 – A Machine Learning Model of the Global Ambient Sound Level

Shane V. Lympany – shane.lympany@blueridgeresearch.com
Michael M. James – michael.james@blueridgeresearch.com
Alexandria R. Salton
Matthew F. Calton
Blue Ridge Research and Consulting, LLC
29 N Market St, Suite 700
Asheville, NC 28801

Kent L. Gee
Mark K. Transtrum
Katrina Pedersen
Department of Physics and Astronomy
Brigham Young University
Provo, Utah 84602

Popular version of paper 4aAB4
Presented Thursday morning, December 5, 2019
178th ASA Meeting, San Diego, CA

Work funded by an Army SBIR

Traffic on a busy road, birds chirping, rushing water—these are some of the many sounds that make up the ambient soundscape, or acoustic environment, that surrounds us. The ambient soundscape is produced by anthropogenic (man-made) and natural sources, and, in turn, the ambient sound level affects the behavior and well-being of humans and animals. Therefore, it is important to understand how the ambient sound level varies in space. To answer this question, we developed a machine learning model to predict the ambient sound level at every point on Earth’s land surface, and we used the model to estimate the global impact of anthropogenic noise.

First, we trained a machine learning model to identify the relationships between more than 1.5 million hours of ambient sound level measurements and 37 environmental variables, such as population density, land cover, and climate. The model predicts the median sound level in A-weighted decibels (dBA). (A-weighting adjusts the sound level based on how the human ear perceives loudness.)

We applied the machine learning model to predict the median daytime ambient sound level at every point on Earth’s land surface (Figure 1). The loudest sound levels occur in highly populated areas, and the quietest sound levels occur in dry biomes with few humans or animals.

daytime Ambient Sound Level
Figure 1. Median daytime ambient sound level produced by anthropogenic and natural sources.

Next, we estimated the natural sound level (Figure 2) by applying the machine learning model to environmental variables that we modified to remove the influence of humans. The natural sound level is loudest in areas with significant biodiversity, such as rainforests.
daytime Ambient Sound Level
Figure 2. Median daytime ambient sound level produced by natural sources only.

The difference between the overall and natural sound levels (Figure 3) is the amount that anthropogenic noise increases the existing ambient sound level above the natural level. Approximately 5.5 billion people and 28 million square kilometers—an area the size of Russia and Canada combined—are affected by anthropogenic noise that increases the ambient sound level by 3 dBA or more. A 3-dBA increase means that anthropogenic noise is about as loud as the natural sound level. Furthermore, approximately 2.2 billion people and 6.1 million square kilometers—an area the size of the Amazon Rainforest—are affected by anthropogenic noise that increases the ambient sound level by 10 dBA or more. A 10-dBA increase means that anthropogenic noise roughly doubles the perceived loudness of the ambient sound level compared to the natural level.
difference Ambient Sound Level
Figure 3. Difference between the overall and natural ambient sound levels.

In this research, we produced the first-ever global maps of the overall and natural ambient sound levels, and we showed that anthropogenic noise impacts billions of people and vast land areas worldwide. Furthermore, our method for modifying environmental variables is a powerful tool that enables us to predict the effects of future scenarios, such as population growth, urbanization, deforestation, and climate change, on the ambient sound level.

4aSC19 – Consonant Variation in Southern Speech

Lisa Lipani – llipani@uga.edu
Michael Olsen – michael.olsen25@uga.edu
Rachel Olsen – rmm75992@uga.edu
Department of Linguistics
University of Georgia
142 Gilbert Hall
Athens, Georgia 30602

Popular version of paper 4aSC19
Presented Thursday morning, December 5, 2019
178th ASA Meeting, San Diego, CA

We all recognize that people from different areas of the United States have different ways of talking, especially in how they pronounce their vowels. Think, for example, about stereotypical Bostonians who might “pahk the cah in Havahd Yahd”. The field of sociolinguistics studies speech sounds from different groups of people to establish and understand regional American dialects.

While there are decades of research on vowels, sociolinguists have recently begun to ask whether consonants such as p, b, t, d, k, and g also vary depending on where people are from or what social groups they belong to. These consonants, p, b, t, d, k, and g, are known as “stop consonants,” because the airflow “stops” due to a closure in your vocal tract. One acoustic characteristic of these consonants is voice onset time, the amount of time between the closure in the vocal tract and the start of vocal fold (also known as vocal cords) vibration. We wanted to know whether some groups of speakers, say men versus women or Texans versus other Southern speakers, pronounced their consonants differently than other groups. In order to investigate this, we used the Digital Archive of Southern Speech (DASS), which contains 367 hours of recordings made across the southeastern United States between 1970 and 1983, consisting of approximately two million words of Southern speech.

The original DASS researchers were mostly interested in differences in language based on the age of speakers and their geographic location. In the interviews, people were asked about specific words that might indicate their dialect. For example, do you say “pail” or “bucket” for the thing you might borrow from Jack and Jill?

We used computational methods to investigate Southern consonants in DASS, looking at pronunciations of p, b, t, d, k, and g at the beginning of roughly 144,000 words. Our results show that ethnicity is a social factor in the production of these sounds. In our data, African Americans had longer voice onset time, meaning that there was a longer period of time between the closure of the stop consonant and the start of vocal fold vibration, even when we adjusted the data for speaking rate. This kind of research is important because as we describe differences in the way we speak, we can better understand how we express our social and regional identity.

5aSC8 – How head and eyebrow movements help make conversation a success

Samantha Danner – sfgordon@usc.edu
Dani Byrd – dbyrd@usc.edu
Department of Linguistics, University of Southern California
Grace Ford Salvatori Hall, Rm. 301
3601 Watt Way
Los Angeles, CA 90089-1693

Jelena Krivokapić– jelenak@umich.edu
Department of Linguistics, University of Michigan
440 Lorch Hall
611 Tappan Street
Ann Arbor, MI 48109-1220

Popular version of poster 5aSC8
Presented Friday morning, December 6, 2019
178th ASA Meeting, San Diego, CA

It’s easy to take for granted our ability to have a conversation, even with someone we’ve never met before. In fact, the human capacity for choreographing conversation is quite incredible. The average time from when one speaker stops speaking to when the next speaker starts is only about 200 milliseconds. Yet somehow, speakers are able to let their conversation partner know when they are ready to turn over the conversational ‘floor.’ Likewise, people somehow sense when it is their turn to start speaking. How, without any conscious effort, is this dance of conversation between two people so relatively smooth?

One possible answer to this question is that we use non-verbal communication to help move conversations along. The study described in this presentation takes a look at how movements of the eyebrow and the head might be used by participants in conversation to help determine when to exchange the conversational floor with one another. For this research, speakers conversed in a pair, each taking turns to collaboratively recite a well-known nursery rhyme like ‘Humpty Dumpty’ or ‘Jack and Jill.’ Using nursery rhymes allowed us to study spontaneous speech (speech that is not rehearsed or read) that offered many opportunities for the members of the pair to take turns speaking. We used an instrument called an electromagnetic articulograph to precisely track the eyebrow and head movements of the two conversing people. Their speech was also recorded, so that it was clear exactly when in the conversation the movements of each person’s brow and head were happening.

We wondered whether we would see more frequent movements of the eyebrows and head when someone is acting as a speaker as opposed to a listener during the conversation, and whether we would see more or less frequent movement at particular moments in the conversation, such as when one person yields the conversational floor to the other, or interrupts the other, or finds that they need to start speaking again after an awkward pause.

We found that listeners move their heads and brows more frequently than speakers. This may mean that people in conversation use face movements to show their engagement with what their partner is saying. We also found that the moment in conversation when movements are most frequent is at interruptions, indicating that listeners may use co-speech movements to signal that they are about to interrupt a speaker.

This research on spoken language helps linguists understand how humans can converse so easily and effectively, highlighting some of the many behaviors we use in talking to each other. Actions of the face and body facilitate the uniquely human capacity for language communication—we use so much more than just our voices to make a conversation happen.

2aNSb7 – Automatically finding focused crowd involvement at basketball games and Mardi Gras parades

Mylan Cook – mylan.cook@gmail.com
Eric Todd – eric.w.todd@gmail.com
Kent L. Gee – kentgee@byu.edu
Mark K. Transtrum – mkt24@byu.edu
Brigham Young University
N283 ESC
Provo, UT 84602

David S. Woolworth – dwoolworth@rwaconsultants.net
Roland, Woolworth & Associates, LLC
356 CR 102
Oxford, MS 38655

Popular version of paper 2aNSb7 Detecting instances of focused crowd involvement at recreational events
Presented Tuesday morning, December 03, 2019
178th ASA Meeting, San Diego, CA
Read the article in Proceedings of Meetings on Acoustics

Audio processing is often used to deal with a single person’s voice, but how do things change when dealing with an entire crowd? While it is relatively easy for a person to judge whether a crowd is booing or cheering, teaching a computer to differentiate between different crowd responses is a challenging problem. Of particular interest herein is the challenge of determining when a crowd is making a concentrated, unified, or focused effort. This research has applications in rewarding crowds, sales, and riot prevention.

Previous work has gone into studying crowds at basketball games using Machine Learning techniques such as K-means clustering. Using spectral sound levels—the loudness at different frequencies—K-means automatically divides our sound samples into different groups, separating levels of crowd noise from levels of band noise or PA system noise.

(BasketballSpectra.jpg) crowd

Figure 1 shows the representative frequency-dependent spectral sound levels for these different groupings.  Using other features common to audio signal processing allows us to automatically divide signals into other sub-groups, one of which appears to correspond to the aforementioned focused crowd effort.

Video 1 presents a graphical representation of some of these features, and how they fluctuate with time; the colors in the video show the different sub-groups found, and by examination, the purple sub-group is found to consist primarily of audio that demonstrate the most focused crowd effort.

The purpose of this investigation is to determine if a similar process can be followed to find focused crowd efforts in another type of crowd, namely that at a Mardi Gras parade, as recorded from a microphone mounted on a float. There are some challenges here, arising from differences in frequency between crowds and because the audio from the Mardi Gras crowd shows very little variation—essentially the crowd is cheering the entire time, and so changes in crowd behavior get buried in the crowd’s clamorous cacophony.

There is still something we can do, however. Within the high-involvement basketball data sub-group we find two audio features—flux, which is the change in energy over time, and slope, which marks how quickly the energy increases as frequency increases—with very large numerical values. By setting a threshold for these values, we can mark all the Mardi Gras data that exceeds these values as likely to exhibit focused crowd involvement.

Video 2 presents a graphical representation of the Mardi Gras parade crowd noise, where audio segments which surpass the threshold and so are likely to contain a concentrated crowd effort are shown in green, and all other segments are shown in red. While validation is ongoing, these results show promise for being able to automatically identify focused crowd involvement in different types of crowds.

4pBAb3 – Bubbles on kidney stones in humans

Julianna C. Simon – jcsimon@psu.edu
Graduate Program in Acoustics
The Pennsylvania State University
201E Applied Science Building
University Park, PA 16802
AND
Center for Industrial and Medical Ultrasound
Applied Physics Laboratory
University of Washington
1013 NE 40th St.
Seattle, WA 98105

Popular version of paper 4pBAb3
Presented Thursday afternoon, December 5, 2019
178th ASA Meeting, San Diego, CA

Kidney stones affect 1 in 11 Americans, with an associated annual cost exceeding $10 billion. Most people are diagnosed with kidney stones using x-ray or CT when they go to the emergency room with severe side pain. Ultrasound can also be used, but the greyscale images can be difficult to interpret. Even though Doppler ultrasound is usually used to monitor blood flow, when you image a kidney stone with Doppler ultrasound, it appears as rapidly changing color, called the twinkling artifact (Fig. 1). This can enhance kidney stone detection with ultrasound, but it doesn’t appear on all stones. Recently, bubbles on the stone were suggested to cause the twinkling artifact because small changes in pressure influence the appearance of twinkling. However, these studies were done on stones outside of the human body, where exposure to air could artificially introduce the bubbles.

kidney stones

Fig. 1: The color Doppler ultrasound twinkling artifact highlights a kidney stone with rapidly-changing color.

In this presentation, we look at the origin of the twinkling artifact on kidney stones by imaging kidney stones in humans with ultrasound while inside a pressure or hyperbaric chamber (Fig. 2). Seven human subjects were exposed to 4 atmospheres of pressure while imaging with ultrasound. We found that twinkling was reduced by 39% at 4 atmospheres compared to twinkling at atmospheric pressure, which was statistically significant. This result suggests, for the first time, that bubbles exist in humans on the surface of kidney stones!

kidney stones

Fig. 2: Imaging people with kidney stones inside a hyperbaric chamber. Due to the risk of fire, the ultrasound system was outside the chamber with the transducer inserted through a port.

We were also curious as to whether these bubbles existed only on the stone surface, or whether they could be embedded in the stone during the stone formation process. In the lab, stones that had been removed from humans were imaged with high resolution CT while reducing the pressure around the stone. We found regions within the stone that grew when we reduced the pressure (Fig. 3), suggesting that bubbles can exist inside the stone, too.

Fig. 3: A high-resolution CT scan of a human kidney stone shows a small, dark region within a crack that expands when the ambient pressure is reduced (yellow circle), suggesting the bubbles are contained within the kidney stone.

NASA funded the study because they are interested in ultrasound for spaceflight and have found that changes in gas composition on space vehicles and pressure in spacesuits affects bubbles associated with decompression sickness and ultrasound imaging. Spaceflight increases the risk of forming kidney stones because bones demineralize, releasing calcium into the blood that is filtered through the kidney. Only four times have astronauts prepared for emergency return to Earth, and one was for a stone that eventually passed. Because of the risk to astronauts and people on Earth, both NASA and NIH have funded researchers at the University of Washington to develop an ultrasound system to image, fragment, and expel stones from the kidney. The key to using this system is being able to see the stone with ultrasound, which is where this work on understanding the twinkling artifact plays an important role.

2pAB2
 – Sound pollution decreases the chances of love for oyster toadfish

Rosalyn Putland rputland@d.umn.edu
University of Minnesota Duluth, 1035 Kirby Drive
Duluth, Minnesota 55812

Alayna Mackiewicz alaynam@live.unc.edu
University of North Carolina Chapel Hill, 120 South Road
Chapel Hill, North Carolina 27599

Jacey Van Wert jcvanwert@ucsb.edu
University of California, Santa Barbara
Santa Barbara, California 93106

Allen Mensinger amensing@d.umn.edu
University of Minnesota Duluth, 1035 Kirby Drive
Duluth, Minnesota 55812

Popular version of paper 2pAB2
 Presented Tuesday afternoon, December 3, 2019
178th ASA Meeting, San Diego, California

Despite the famous marine explorer Jacques Cousteau describing the underwater environment as a “silent world”, scientists have discovered sound plays a key role in the lives of many aquatic species. For example, many fishes vocalize to deter predators and attract mates. The oyster toadfish, Opsanus tau, has a rich vocal repertoire, producing a variety of calls with fast contracting muscles along its swimbladder. At the beginning of the mating season, in early summer, male toadfish establish a nest and produce calls termed “boatwhistles” to both announce their territory to competing males and attract females to lay eggs in their nest. However, despite toadfish being studied since 1888, surprisingly little is known about the what part of the song attracts the female, what is the range of the male’s call and the potential effects of sound produced by anthropogenic activity within coastal waters where the toadfish reside.

toadfishFigure 1: Photograph of an oyster toadfish, Opsanus tau. Taken by Allen Mensinger

Therefore, in 2015 the Mensinger lab began conducting passive acoustic monitoring on a resident population of oyster toadfish located in Eel Pond, a small saltwater harbour, adjacent to the Marine Biological Laboratory (MBL) in Woods Hole, Massachusetts. Male toadfish produce unique acoustic signatures in their boatwhistles and the relatively small number of toadfish in the area (< 15) provided an opportunity to study the effect of anthropogenic sound on individual male calling, as males remain in their nests for the entire mating season. Additionally, the movements of large motorized watercraft are restricted by a drawbridge, allowing a natural control, or quiet time, at night when no man-made sound was present. The lab suspended four hydrophones from the dock to record boatwhistles and mathematically compute individual fish locations.

toadfishFigure 2: Toadfish boatwhistles are shown during an 8 second calling window. Three distinct individuals were identified based on waveform shape, spectrogram components and amplitude with one highlight in the box. From Putland et al. 2018.

Toadfish SOUND 1 (1 second)

Toadfish SOUND 2 (1 minute)
Sound clip of oyster toadfish boatwhistles recorded in Eel Pond, Woods Hole, Massachusetts, USA.

Figure 3: Photograph of equipment being deployed at the end of the dock with the research vessel passing by. Taken by Rosalyn Putland

Male toadfish called significantly less following exposure to even brief vessel sound, (5 to 10 minutes) with the motor sound swamping the sound frequency (50 – 500 Hz) and “loudness” of toadfish boatwhistles. Unique sound characteristics of the boatwhistle (pitch, volume, duration) are thought to be used by females to pick the biggest and best mates. For example, larger fish tend to produce lower frequency, louder and longer boatwhistles than smaller individuals. However, exposure to vessel sound could potentially mask the mating call and leave the females swimming aimlessly.

Determining when, where and how often fish, such as the oyster toadfish, are producing sound allows acoustically sensitive times and areas to be prioritized during management strategy. For example, vessel speed restrictions or restricted boat traffic could be enforced to reduce sound levels during critical spawning periods. The male fish is considered by many to be unattractive and does not need man made sound interfering with his love song.