1aSPa5 – Saving Lives During Disasters by Using Drones

Macarena Varela – macarena.varela@fkie.fraunhofer.de
Wulf-Dieter Wirth – wulf-dieter.wirth@fkie.fraunhofer.de
Fraunhofer FKIE/ Department of Sensor Data and Information Fusion (SDF)
Fraunhoferstr. 20
53343 Wachtberg, Germany

Popular version of ‘1aSPa5 Bearing estimation of screams using a volumetric microphone array mounted on a UAV’
Presented Tuesday morning 9:30 AM – 11:15 AM, June 8, 2021
180th ASA Meeting, Acoustics in Focus
Read the abstract by clicking here.

During disasters, such as earthquakes or shipwrecks, every minute counts to find survivors.

Unmanned Aerial Vehicles (UAVs), also called drones, can better reach and cover inaccessible and larger areas than rescuers on the ground or other types of vehicles, such as Unmanned Ground Vehicles.  Nowadays, UAVs could be equipped with state-of-the-art technology to provide quick situational awareness, and support rescue teams to locate victims during disasters.

[Video: Field experiment using the MEMS system mounted on the drone to hear impulsive sounds produced by a potential victim.mp4]

Survivors typically plead for help by producing impulsive sounds, such as screams. Therefore, an accurate acoustic system mounted on a drone is currently being developed at Fraunhofer FKIE, focused on localizing those potential victims.

The system will be filtering environmental and UAV noise in order to get positive detections on human screams or other impulsive sounds. It will be using a particular type of microphone array, called “Crow’s Nest Array” (CNA) combined with advanced signal processing techniques (beamforming) to provide accurate locations of the specific sounds produced by missing people (see Figure 1). The spatial distribution and number of microphones in arrays have a crucial influence on the estimated location accuracy, therefore it is important to select them properly.

Figure 1: Conceptual diagram to localize victims

The system components are minimized in quantity, weight and size, for the purpose of being mounted on a drone. With this in mind, the microphone array is composed of a large number of tinny digital Micro-Electro-Mechanical-Systems (MEMS) microphones to find the locations of the victims. In addition, one supplementary condenser microphone covering a larger frequency spectrum will be used to have a more precise signal for detection and classification purposes.

Figure 2: Acoustic system mounted on a drone

Figure 2: Acoustic system mounted on a drone

Different experiments, including open field experiments, have successfully been conducted, demonstrating the good performance of the ongoing project.

3aSP1 – Using Physics to Solve the Cocktail Party Problem

Keith McElveen – keith.mcelveen@wavesciencescorp.com
Wave Sciences
151 King Street
Charleston, SC USA 29401

Popular version of paper ‘Robust speech separation in underdetermined conditions by estimating Green’s functions’
Presented Thursday morning, June 10th, 2021
180th ASA Meeting, Acoustics in Focus

Nearly seventy years ago, a hearing researcher named Colin Cherry said that “One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it the cocktail party problem.” No machine has been constructed to do just this, to filter out one conversation from a number jumbled together.”

Despite many claims of success over the years, the Cocktail Party Problem has resisted solution.  The present research investigates a new approach that blends tricks used by human hearing with laws of physics. With this approach, it is possible to isolate a voice based on where it must have come from – somewhat like visualizing balls moving around a billiard table after being struck, except in reverse, and in 3D. This approach is shown to be highly effective in extremely challenging real-world conditions with as few as four microphones – the same number as found in many smart speakers and pairs of hearing aids.

The first “trick” is something that hearing scientists call “glimpsing”. Humans subconsciously piece together audible “glimpses” of a desired voice as it momentarily rises above the level of competing sounds. After gathering enough glimpses, our brains “learn” how the desired voice moves through the room to our ears and use this knowledge to ignore the other sounds.

The second “trick” is based on how humans use sounds that arrive “late”, because they bounced off of one or more large surfaces along the way. Human hearing somehow combines these reflected “copies” of the talker’s voice with the direct version to help us hear more clearly.

The present research mimics human hearing by using glimpses to build a detailed physics model – called a Green’s Function – of how sound travels from the talker to each of several microphones. It then uses the Green’s Function to reject all sounds that arrived via different paths and to reassemble the direct and reflected copies into the desired speech. The accompanying sound file illustrates typical results this approach achieves.

Original Cocktail Party Sound File, Followed by Separated Nearest Talker, then Farthest

While prior approaches have struggled to equal human hearing in a realistic cocktail party babel, even at close distances, the research results we are presenting imply that it is now possible to not only equal, but to exceed human hearing and solve The Cocktail Party Problem, even with a small number of microphones in no particular arrangement.

The many implications of this research include improved conference call systems, hearing aids, automotive voice command systems, and other voice assistants – such as smart speakers. Our future research plans include further testing as well as devising intuitive user interfaces that can take full advantage of this capability.

No one knows exactly how human hearing solves the Cocktail Party Problem, but it would be very interesting indeed if it is found to use its own version of a Green’s Function.

1aSPa4 – The Sound of Drones

Valentin V. Gravirov –  vvg@ifz.ru
Russian Federation
Moscow 123242

Popular version of paper 1aSPa4
Presented Tuesday morning, June 8, 2021
180th ASA Meeting, Acoustics in Focus

Good afternoon, dear readers! I represent a research team from Russia and in this brief sci-pop summary, I would like to tell you about the essence of the work carried out recently. Our main goal was to study the sound generated by drones during flight in order to solve the problems of their automatic finding and recognition. It’s no secret that unmanned aerial vehicles or drones are now developing and progressing extremely fast. The drones are beginning to be used everywhere, for example, for filming, searching for missing people, delivery of documents and small packages. Obviously, over time, the number of tasks completed and the number of unmanned aerial vehicles will continue to increase. This will inevitably lead to an increase in the number of collisions in the air >.

Last year, as part of our expedition to the Arctic region, we personally encountered a similar problem.

Our expeditionary team used two drones to photograph a polar bear, which nearly caused them to collide. That is, two quadrocopters almost collided in circumstances when there was no other drone within a radius of a thousand kilometers. Imagine the danger of air traffic, when many devices are flying nearby? Within the framework of civilian use, such a problem can be solved by using active radio beacons on drones, but in official use, for example, in military tasks, it is obvious that such systems will be unacceptable. To solve such problems, a large number of optical systems for recognizing drones have already been created, but they do not always give a accurate results and often significantly depend on weather conditions or the time of day. That is why our research group has set itself the goal of studying the acoustic noise generated by unmanned aerial vehicles, this will allow us to find new ways to solve the urgent problem of detecting and determining the location of drones.

In the course of the experiments, the sound generated by typical electric motors of drones with the installation of propellers with different numbers of blades were studied in detail. The analysis of the results obtained allowed us to conclude that the main factor to the noise is created by the rotational speed of the blades, which is equal to the rotational speed of the engine shaft, multiplied by the number of blades. At the same time, due to the presence of small defects in the blades, the sound of each specific blade are slightly different. The studies also examined the noise generated by two popular household drone models DJI Mavic

household drone
Used household drone models DJI Mavic.  in dense urban environments with high levels of urban acoustic noise. It was found that at distances exceeding 30 meters, the acoustic signal level disappears in the background to urban noise, which can be explained by the small size and small power of the models studied. Undoubtedly, outside the city or in a quiet place, the detection range of drones will be significantly greater. In the course of the experiments, it was found that the main sound generated by drones lie in the frequency range 100 – 2000 Hz

In addition to field experiments, mathematical modeling was also carried out, the results of which coincide with the obtained experimental data. An algorithm based on the use of artificial neural networks technology has been developed for automated recognition of drones. At the current time, the algorithm allows detecting a drone with a 94% accuracy. Unfortunately, the probability of false positives is still high and amounts to about 12%. This will require us to carry out in the near future both additional research and work on a significant improvement of the recognition algorithm.

2pSPc4 – Determining the Distance of Sperm Whales in the Northern Gulf of Mexico from an Underwater Acoustic Recoding Device

Kendal Leftwich – kmleftwi@uno.edu
George Druant – George.Drouant@oit.edu
Julia Robe – jerobe@uno.edu
Juliette Ioup – jioup@uno.edu

University of New Orleans
2000 Lakeshore Drive
New Orleans, LA 70148

Determining the range to marine mammals in the Northern gulf of Mexico via bayesian acoustic signal processing
Presented Acoustic Localization IV afternoon, December 8, 2020
179th ASA Meeting, Acoustics Virtually Everywhere

The Littoral Acoustic Demonstration Center – Gulf Ecological Monitoring and Modeling (LADC-GEMM) has been collecting underwater acoustic data in the Northern Gulf of Mexico (GoM) since 2002 through 2017.  Figure 1 shows the collection sites and the location of the BP oil spill in April 2010.  The data are collected by a hydrophone, an underwater microphone, which records the acoustic signals or sounds of the region.

One of the goals of the research at the University of New Orleans (UNO) is to identify individual marine mammals by their acoustic signal.  Part of this identification includes being able to locate them.   In this paper we will briefly explain how we are attempting to locate sperm whales in the GoM.

First, we need to understand how the whale’s sounds travel through the water and what happens to them as they do.  Any sound that travels through a medium (air, water, or any material) will have its loudness decreased.  For example, it is much easier to hear a person talking to you when you are in the same room, but if they are talking to you through a wall their voice level is reduced because the signal travels through a medium (the wall) that reduces its loudness.  Therefore, as the whale signal travels through the GoM to our hydrophones the loudness of the signal is reduced.  The impact that this has on the whale’s signal is determined by the temperature, the depth of the recording device below the surface, the salinity, and the pH level of the water.  Using this information, we can determine how much the loudness of the whale’s signal will decrease per kilometer that the signal travels.  This can be seen in figure 2.

We will use the known signal loudness of the sound emitted by a sperm whale and the recorded loudness of the signal along with the impact of the GoM on the signal to determine how far away the sperm whale is from our hydrophone.   Unfortunately, due to technical limitations of the equipment we can only do this for a single hydrophone so we cannot currently locate the sperm whale’s exact position. We can only tell you where it is located at a certain distance around the hydrophone.  Figures 3 shows graphically the results of our calculations for two of the 276 sperm whale signals we used with our model to estimate how far away the whale is from our hydrophone.

2aSP2 – Self-Driving Cars: Have You Considered Using Sound?

Keegan Yi Hang Sim – yhsim@connect.ust.hk
Yijia Chen
Yuxuan Wan
Kevin Chau

Department of Electronic and Computer Engineering
Hong Kong University of Science and Technology
Clear Water Bay
Hong Kong

Popular version of paper 2aSP2
Presented Tuesday morning, December 03, 2019
178th Meeting, San Diego, CA

Self-driving cars are currently a major interest for engineers around the globe. They incorporate more advanced versions of steering and acceleration control found in many of today’s cars. Cameras, radars, and lidars (light detection and ranging) are frequently used to detect obstacles and automatically brake to avoid collision. Air bags, which have been in use as early as 1951, soften the impact during an actual collision.

Vision Zero, an ongoing multinational effort, hopes that all car crashes will eventually be eliminated, and self-driving autonomous vehicles are likely to play a key role in achieving this. However, current technology is unlikely to be enough, as it does not works poorly in low light conditions. We believe that using sound, although it provides less which carries a unique information, is also important as it can be used in all scenarios and also likely performs much better.

Sound waves travel as fast as seventeen times faster in a car than at 1/3 of a kilometer per second in the air, which leads to much faster detection by using sound instead of acceleration, and clearly is not affected by light, air quality, and other factors. Previous research was able to use sound to detect collisions and sirens, but by the time a collision occurs, it is far too late. So instead we want to identify sounds that frequently occur before car crashes, such as tire skidding, honking, and sometimes screaming to figure out the direction they are coming from. Thus, we have designed a method to predict a car crash by detecting and isolating the sounds of tire skidding that might signal a possible crash.

The algorithm utilizes the discrete wavelet transform (DWT), which decomposes a sound wave into high- and low-frequency components in time all sorts of tiny waves each lasting for a short period in time. This can be done repeatedly, yielding a series of components of various frequencies. Using wavelets is significantly faster and gives much more accurate and precise results representation of transient events associated with car crashes than elementary techniques such as the Fourier Transform, which transforms a sound into its frequency steady oscillation components. Previous methods to detect car crashes examined the highest frequency components, but tire skidding only contains lower frequency components, whereas a collision contains almost all frequencies.

One can hear in the original audio of a car crash the three distinct sections: honking, tire skidding, and the collision.

The top diagram shows the audio displayed as a waveform, plotted against time. The bottom shows a spectrogram of the audio, with frequency on the y-vertical axis and time on the horizontal x-axis, and the brightness of the color representing the magnitude of a particular frequency component. This was created using a variation of the Fourier Transform. One can observe the differences in appearance between honking, tire skidding, and collision, which suggests that mathematical methods should be able to detect and isolate these. We can also see that the collision occupies all frequencies while tire skidding occupies lower frequencies with two distinct sharp bands at around 2000Hz.

“OutCollision.wav , the isolated audio containing just that isolates the car crash”

Using our algorithm, we were able to create audio files containing just that isolate the honking, tire skidding, as well as the collision. One can hear that they doThey may not sound like normal honking, tire skidding or collisions, which is a byproduct of our algorithm. Fortunately, but this does not affect the ability to detect the tire skidding various events by a computer.

The algorithm performs well for detecting the honking and tire skidding, and is fast enough to be done in real time, before acceleration information can be processed which would be great for the raising the alert of a possible crash, and for activating the hazard lights and seatbelt pre-tensioners. The use of sound in cars is a big step forward for the analysis prevention of car crashes, as well as improving autonomous and driverless vehicles and achieving Vision Zero, by providing a car with more timely and valuable information about its surroundings.

2aSPa8 and 4aSP6 – Safe and Sound – Using acoustics to improve the safety of autonomous vehicles

Eoin A King – eoking@hartford.edu
Akin Tatoglu
Digno Iglesias
Anthony Matriss
Ethan Wagner

Department of Mechanical Engineering
University of Hartford
200 Bloomfield Avenue
West Hartford, CT 06117

Popular version of papers 2aSPa8 and 4aSP6
Presented Tuesday and Thursday morning, May 14 & 16, 2019
177th ASA Meeting, Louisville, KY

In cities across the world everyday, people use and process acoustic alerts to safely interact in and amongst traffic; drivers listen for emergency sirens from police cars or fire engines, or the sounding of a horn to warn of an impending collision, while pedestrians listen for cars when crossing a road – a city is full of sounds with meaning, and these sounds make the city a safer place.

Future cities will see the large-scale deployment of (semi-) autonomous vehicles (AVs). AV technology is quickly becoming a reality, however, the manner in which AVs and other vehicles will coexist and communicate with one another is still unclear, especially during the prolonged period of mixed vehicles sharing the road. In particular, the manner in which Autonomous Vehicles can use acoustic cues to supplement their decision-making process is an area that needs development.

The research presented here aims to identify the meaning behind specific sounds in a city related to safety. We are developing methodologies to recognize and locate acoustic alerts in cities and use this information to inform the decision-making process of all road users, with particular emphasis on Autonomous Vehicles. Initially we aim to define a new set of audio-visual detection and localization tools to identify the location of a rapidly approaching emergency vehicle. In short we are trying to develop the ‘ears’ to complement the ‘eyes’ already present on autonomous vehicles.

Test Set-Up
For our initial tests we developed a low cost array consisting of two linear arrays of 4 MEMS microphones. The array was used in conjunction with a mobile robot equipped with visual sensors as shown in Fig. 1. Our array acquired acoustic signals that were analyzed to i) identify the presence of an emergency siren, and then ii) determine the location of the sound source (which was occasionally behind an obstacle). Initially our tests were conducted in the controlled setting of an anechoic chamber.

autonomous vehicles

Picture 1: Test Robot with Acoustic Array

Step 1: Using convolutional neural networks for the detection of an emergency siren
Using advanced machine learning techniques, it has become possible to ‘teach’ a machine (or a vehicle) to recognize certain sounds. We used a deep layer Convolutional Neural Network (CNN) and trained it to recognize emergency sirens in real time, with 99.5% accuracy in test audio signals.

Step 2: Identifying the location of the source of the emergency siren
Once an emergency sound has been detected, it must be rapidly localized. This is a complex task in a city environment, due to moving sources, reflections from buildings, other noise sources, etc. However, by combining acoustic results with information acquired from the visual sensors already present on an autonomous vehicle, it will be possible to identify the location of a sound source. In our research, we modified an existing direction-of-arrival algorithm to report a number of sound source directions, arising from multiple reflections in the environment (i.e. every reflection is recognized as an individual source). These results can be combined with the 3D map of the area acquired from the robot’s visual sensors. A reverse ray tracing approach can then be used to triangulate the likely position of the source.

Picture 2: Example test results. Note in this test our array indicates a source at approximately 30o and another at approximately -60o.

Picture 3: Ray Trace Method. Note, by tracing the path of the estimated angles, both reflected and direct, the approximate source location can be triangulated.

Video explaining theory.