Presence of a drone and estimating its range simply from the drone audio emissions

Kaliappan Gopalan –

Purdue University Northwest, Hammond, IN, 46323, United States

Brett Y. Smolenski, North Point Defense, Rome, NY, USA
Darren Haddad, Information Exploitation Branch, Air Force Research Laboratory, Rome, NY, USA

Popular version of 1ASP8-Detection and Classification of Drones using Fourier-Bessel Series Representation of Acoustic Emissions, presented at the 183rd ASA Meeting.

With the proliferation of drones – from medical supply and hobbyist to surveillance, fire detection and illegal drug delivery, to name a few – of various sizes and capabilities flying day or night, it is imperative to detect their presence and estimate their range for security, safety and privacy reasons.

Our paper describes a technique for detecting the presence of a drone, as opposed to environmental noise such as from birds and moving vehicles, simply from the audio emissions of the drone from its motors, propellers and mechanical vibrations. By applying a feature extraction technique that separates a drone’s distinct audio spectrum from that of atmospheric noise, and employing machine learning algorithms, we were able to identify drones from three different classes flying outdoors with correct class in over 78 % of cases. Additionally, we estimated the range of a drone from the observation point correctly to within ±50 cm in over 85 % of cases.

We evaluated unique features characterizing each type of drone using a mathematical technique known as the Fourier-Bessel series expansion. Using these features which not only differentiated the drone class but also differentiated the drone range, we applied machine learning algorithms to train a deep learning network with ground truth values of drone type, or its range as a discrete variable at intervals of 50 cm. When the trained learning network was tested with new, unused features, we obtained the correct type of drone – with a nonzero range – and a range class that was within the appropriate class, that is, within ±50 cm of the actual range.

Any point along the main diagonal line indicates correct range class, that is, within ±50 cm of actual range, while off-diagonal values correspond to false classification error.

For identifying more than three types of drones, we tested seven different types of drones, namely, DJI S1000, DJI M600, Phantom 4 Pro, Phantom 4 QP with a quieter set of propellers, Mavic Pro Platinum, Mavic 2 Pro, and Mavic Pro, all tethered in an anechoic chamber in an Air Force laboratory and controlled by an operator to go through a series of propeller maneuvers (idle, left roll, right roll, pitch forward, pitch backward, left yaw, right yaw, half throttle, and full throttle) to fully capture the array of sounds the craft emit. Our trained deep learning network correctly identified the drone type in 84 % of our test cases.  Figure 1 shows the results of range classification for each outdoor drone flying between a line-of-sight range of 0 (no-drone) to 935 m.

3pSP4 – Imaging Watermelons

Dr. David Joseph Zartman
Zartman Inc., L.L.C.,
Loveland, Colorado

Popular version of 3pSP4 – Imaging watermelons
Presented Wednesday afternoon, May 25, 2022
182nd ASA Meeting, Denver
Click here to read the abstract

When imaging watermelons, everything can be simplified down to measuring a variable called ripeness, which is a measure of the internal medium of the watermelon, rather than looking for internal reflections from any contents such as seeds. The optimal experimental approach acoustically is thus a through measurement, exciting the wave on one side and measuring the result on the other.

Before investigating the acoustic properties, it is useful to examine watermelons’ ripening properties from a material perspective.  As the fruit develops, it starts off very hard and fibrous with a thick skin. Striking an object like this would be similar to hitting a rock, or possibly a stick given the fibrous nature of the internal contents of the watermelon.

As the watermelon ripens, this solid fiber starts to contain more and more liquid, which also sweetens over time. This process continues and transforms the fruit from something too fibrous and bitter to something juicy and sweet. Most people have their own preference for exactly how crunchy versus sweet they personally prefer. The skin also thins throughout this process. As the fibers continue to be broken down beyond optimal ripeness, the fruit becomes mostly fluid, possibly overly sweet, and with a very thin skin.  Striking the fruit at this stage would be similar to hitting some sort of water balloon. While the sweet juice sounds like a positive, the overall texture at the stage is usually not considered desirable.

In review, as watermelons ripen, they transform from something extremely solid to something more resembling a liquid filled water balloon. These are the under-ripe and over-ripe conditions; thus, the personal ideal exists somewhere between the two. Some choose to focus on the crunchy earlier stage at the cost of some of the sweetness, possibly also preferable to anyone struggling with blood sugar issues, in contrast to those preferring to maximize the sweet juicy nature of the later stages at the cost of crunchy texture.

The common form of acoustic measurement in this situation is to simply strike the surface of the watermelon with a finger knuckle and listen to the sound. More accuracy is possible by feeling with fingertips on the opposite side of the watermelon when it is struck. Both young and old fruit do not have much response, one being too hard, getting an immediate sharp response and being more painful on the impacting finger. The other is more liquid and thus is more difficult to sonically excite. A young watermelon may make a sound described as a hard ‘tink’, while an old one could be described more as a soft ‘phlub’. In between, it is possible to feel the fibers in the liquid vibrating for a period of time, creating a sound more like a ‘toong’. A shorter resonance, ‘tong’, indicates younger fruit, while more difficulty getting a sound through, ‘tung’, indicates older.

An optimal watermelon can thus be chosen by feeling or hearing the resonant properties of the fruit when it is struck and choosing to preference.

1aSPa5 – Saving Lives During Disasters by Using Drones – Macarena Varela

Macarena Varela –
Wulf-Dieter Wirth –
Fraunhofer FKIE/ Department of Sensor Data and Information Fusion (SDF)
Fraunhoferstr. 20
53343 Wachtberg, Germany

Popular version of paper ‘1aSPa5’
Presented Tuesday morning 9:30 AM – 11:15 AM, June 8, 2021
180th ASA Meeting, Acoustics in Focus

During disasters, such as earthquakes or shipwrecks, every minute counts to find survivors.

Unmanned Aerial Vehicles (UAVs), also called drones, can better reach and cover inaccessible and larger areas than rescuers on the ground or other types of vehicles, such as Unmanned Ground Vehicles.  Nowadays, UAVs could be equipped with state-of-the-art technology to provide quick situational awareness, and support rescue teams to locate victims during disasters.

[Video: Field experiment using the MEMS system mounted on the drone to hear impulsive sounds produced by a potential victim.mp4]

Survivors typically plead for help by producing impulsive sounds, such as screams. Therefore, an accurate acoustic system mounted on a drone is currently being developed at Fraunhofer FKIE, focused on localizing those potential victims.

The system will be filtering environmental and UAV noise in order to get positive detections on human screams or other impulsive sounds. It will be using a particular type of microphone array, called “Crow’s Nest Array” (CNA) combined with advanced signal processing techniques (beamforming) to provide accurate locations of the specific sounds produced by missing people (see Figure 1). The spatial distribution and number of microphones in arrays have a crucial influence on the estimated location accuracy, therefore it is important to select them properly.

Figure 1: Conceptual diagram to localize victims

 The system components are minimized in quantity, weight and size, for the purpose of being mounted on a drone. With this in mind, the microphone array is composed of a large number of tinny digital Micro-Electro-Mechanical-Systems (MEMS) microphones to find the locations of the victims. In addition, one supplementary condenser microphone covering a larger frequency spectrum will be used to have a more precise signal for detection and classification purposes.

Figure 2: Acoustic system mounted on a drone

Different experiments, including open field experiments, have successfully been conducted, demonstrating the good performance of the ongoing project.

3aSP1 – Using Physics to Solve the Cocktail Party Problem – Keith McElveen

Keith McElveen –
Wave Sciences
151 King Street
Charleston, SC USA 29401

Popular version of paper ‘Robust speech separation in underdetermined conditions by estimating Green’s functions’
Presented Thursday morning, June 10th, 2021
180th ASA Meeting, Acoustics in Focus

Nearly seventy years ago, a hearing researcher named Colin Cherry said that “One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it the cocktail party problem.” No machine has been constructed to do just this, to filter out one conversation from a number jumbled together.”

Despite many claims of success over the years, the Cocktail Party Problem has resisted solution.  The present research investigates a new approach that blends tricks used by human hearing with laws of physics. With this approach, it is possible to isolate a voice based on where it must have come from – somewhat like visualizing balls moving around a billiard table after being struck, except in reverse, and in 3D. This approach is shown to be highly effective in extremely challenging real-world conditions with as few as four microphones – the same number as found in many smart speakers and pairs of hearing aids.

The first “trick” is something that hearing scientists call “glimpsing”. Humans subconsciously piece together audible “glimpses” of a desired voice as it momentarily rises above the level of competing sounds. After gathering enough glimpses, our brains “learn” how the desired voice moves through the room to our ears and use this knowledge to ignore the other sounds.

The second “trick” is based on how humans use sounds that arrive “late”, because they bounced off of one or more large surfaces along the way. Human hearing somehow combines these reflected “copies” of the talker’s voice with the direct version to help us hear more clearly.

The present research mimics human hearing by using glimpses to build a detailed physics model – called a Green’s Function – of how sound travels from the talker to each of several microphones. It then uses the Green’s Function to reject all sounds that arrived via different paths and to reassemble the direct and reflected copies into the desired speech. The accompanying sound file illustrates typical results this approach achieves.

McElveen_Before_Then_Near_Then_Far_Talkers.wav, Original Cocktail Party Sound File, Followed by Separated Nearest Talker, then Farthest

While prior approaches have struggled to equal human hearing in a realistic cocktail party babel, even at close distances, the research results we are presenting imply that it is now possible to not only equal, but to exceed human hearing and solve The Cocktail Party Problem, even with a small number of microphones in no particular arrangement.

The many implications of this research include improved conference call systems, hearing aids, automotive voice command systems, and other voice assistants – such as smart speakers. Our future research plans include further testing as well as devising intuitive user interfaces that can take full advantage of this capability.

No one knows exactly how human hearing solves the Cocktail Party Problem, but it would be very interesting indeed if it is found to use its own version of a Green’s Function.

1aSPa4 – The Sound of Drones – Valentin V. Gravirov

The Sound of Drones

Valentin V. Gravirov –
Russian Federation
Moscow 123242


Popular version of paper 1aSPa4

Presented Tuesday morning, June 8, 2021

180th ASA Meeting, Acoustics in Focus


Good afternoon, dear readers! I represent a research team from Russia and in this brief sci-pop summary, I would like to tell you about the essence of the work carried out recently. Our main goal was to study the sound generated by drones during flight in order to solve the problems of their automatic finding and recognition. It’s no secret that unmanned aerial vehicles or drones are now developing and progressing extremely fast. The drones are beginning to be used everywhere, for example, for filming, searching for missing people, delivery of documents and small packages. Obviously, over time, the number of tasks completed and the number of unmanned aerial vehicles will continue to increase. This will inevitably lead to an increase in the number of collisions in the air >.

Last year, as part of our expedition to the Arctic region, we personally encountered a similar problem.

Our expeditionary team used two drones to photograph a polar bear, which nearly caused them to collide. That is, two quadrocopters almost collided in circumstances when there was no other drone within a radius of a thousand kilometers. Imagine the danger of air traffic, when many devices are flying nearby? Within the framework of civilian use, such a problem can be solved by using active radio beacons on drones, but in official use, for example, in military tasks, it is obvious that such systems will be unacceptable. To solve such problems, a large number of optical systems for recognizing drones have already been created, but they do not always give a accurate results and often significantly depend on weather conditions or the time of day. That is why our research group has set itself the goal of studying the acoustic noise generated by unmanned aerial vehicles, this will allow us to find new ways to solve the urgent problem of detecting and determining the location of drones.

In the course of the experiments, the sound generated by typical electric motors of drones with the installation of propellers with different numbers of blades were studied in detail. The analysis of the results obtained allowed us to conclude that the main factor to the noise is created by the rotational speed of the blades, which is equal to the rotational speed of the engine shaft, multiplied by the number of blades. At the same time, due to the presence of small defects in the blades, the sound of each specific blade are slightly different. The studies also examined the noise generated by two popular household drone models DJI Mavic

Used household drone models DJI Mavic.  in dense urban environments with high levels of urban acoustic noise. It was found that at distances exceeding 30 meters, the acoustic signal level disappears in the background to urban noise, which can be explained by the small size and small power of the models studied. Undoubtedly, outside the city or in a quiet place, the detection range of drones will be significantly greater. In the course of the experiments, it was found that the main sound generated by drones lie in the frequency range 100 – 2000 Hz 

In addition to field experiments, mathematical modeling was also carried out, the results of which coincide with the obtained experimental data. An algorithm based on the use of artificial neural networks technology has been developed for automated recognition of drones. At the current time, the algorithm allows detecting a drone with a 94% accuracy. Unfortunately, the probability of false positives is still high and amounts to about 12%. This will require us to carry out in the near future both additional research and work on a significant improvement of the recognition algorithm.

2pSPc4 – Determining the Distance of Sperm Whales in the Norther Gulf of Mexico from an Underwater Acoustic Recoding Device – Kendal Leftwich

Determining the Distance of Sperm Whales in the Norther Gulf of Mexico from an Underwater Acoustic Recoding Device

Kendal Leftwich –

George Druant –

Julia Robe –

Juliette Ioup –

University of New Orleans

2000 Lakeshore Drive

New Orleans, LA 70148


Determining the range to marine mammals in the Northern gulf of Mexico via bayesian acoustic signal processing

Presented Acoustic Localization IV afternoon, December 8, 2020

179th ASA Meeting, Acoustics Virtually Everywhere


The Littoral Acoustic Demonstration Center – Gulf Ecological Monitoring and Modeling (LADC-GEMM) has been collecting underwater acoustic data in the Northern Gulf of Mexico (GoM) since 2002 through 2017.  Figure 1 shows the collection sites and the location of the BP oil spill in April 2010.  The data are collected by a hydrophone, an underwater microphone, which records the acoustic signals or sounds of the region.


One of the goals of the research at the University of New Orleans (UNO) is to identify individual marine mammals by their acoustic signal.  Part of this identification includes being able to locate them.   In this paper we will briefly explain how we are attempting to locate sperm whales in the GoM.


First, we need to understand how the whale’s sounds travel through the water and what happens to them as they do.  Any sound that travels through a medium (air, water, or any material) will have its loudness decreased.  For example, it is much easier to hear a person talking to you when you are in the same room, but if they are talking to you through a wall their voice level is reduced because the signal travels through a medium (the wall) that reduces its loudness.  Therefore, as the whale signal travels through the GoM to our hydrophones the loudness of the signal is reduced.  The impact that this has on the whale’s signal is determined by the temperature, the depth of the recording device below the surface, the salinity, and the pH level of the water.  Using this information, we can determine how much the loudness of the whale’s signal will decrease per kilometer that the signal travels.  This can be seen in figure 2.



We will use the known signal loudness of the sound emitted by a sperm whale and the recorded loudness of the signal along with the impact of the GoM on the signal to determine how far away the sperm whale is from our hydrophone.   Unfortunately, due to technical limitations of the equipment we can only do this for a single hydrophone so we cannot currently locate the sperm whale’s exact position. We can only tell you where it is located at a certain distance around the hydrophone.  Figures 3 shows graphically the results of our calculations for two of the 276 sperm whale signals we used with our model to estimate how far away the whale is from our hydrophone.