3aUW8 – A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target – Laura N. Kloepper

A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target

Laura N. Kloepper– lkloepper@saintmarys.edu
Saint Mary’s College
Notre Dame, IN 46556


Yang Liu–yang.liu@umassd.edu
John R. Buck– jbuck@umassd.edu
University of Massachusetts Dartmouth
285 Old Westport Road
Dartmouth, MA 02747


Paul E. Nachtigall–nachtiga@hawaii.edu
University of Hawaii at Manoa
PO Box 1346
Kaneohe, HI 96744


Popular version of paper 3aUW8, “Bottlenose dolphins direct sonar clicks off-axis of targets to maximize Fisher Information about target bearing”

Presented Wednesday morning, November 4, 2015, 10:25 AM in River Terrace 2

170th ASA Meeting, Jacksonville


Bottlenose dolphins are incredible echolocators. Using just sound, they can detect a ping-pong ball sized object from 100 m away, and discriminate between objects differing in thickness by less than 1 mm. Based on what we know about man-made sonar, however, the dolphins’ sonar abilities are an enigma–simply put, they shouldn’t be as good at echolocation as they actually are.

Typical manmade sonar devices achi­eve high levels of performance by using very narrow sonar beams. Creating narrow beams requires large and costly equipment. In contrast to these manmade sonars, bottlenose dolphins achieve the same levels of performance with a sonar beam that is many times wider–but how? Understanding their “sonar secret” can help lead to more sophisticated synthetic sonar devices.

Bottlenose dolphins’ echolocation signals contain a wide range of frequencies.  The higher frequencies propagate away from the dolphin in a narrower beam than the low frequencies do. This means the emitted sonar beam of the dolphin is frequency-dependent.  Objects directly in front of the animal echo back all of the frequencies.   However, as we move out of the direct line in front of the animal, there is less and less high frequency, and when the target is way off to the side, only the lower frequencies reach the target to bounce back.   As shown below in Fig. 1, an object 30 degrees off the sonar beam axis has lost most of the frequencies.




Figure 1. Beam pattern and normalized amplitude as a function of signal frequency and bearing angle. At 0 degrees, or on-axis, the beam contains an equal representation across all frequencies. As the bearing angle deviates from 0, however, the higher frequency components fall off rapidly.

Consider an analogy to light shining through a prism.  White light entering the prism contains every frequency, but the light leaving the prism at different angles contains different colors.  If we moved a mirror to different angles along the light beam, it would change the color reflected as it moved through different regions of the transmitted beam.  If we were very good, we could locate the mirror precisely in angle based on the color reflected.  If the color changes more rapidly with angle in one region of the beam, we would be most sensitive to small changes in position at that angle, since small changes in position would create large changes in color.  In mathematical terms, this region of maximum change would have the largest gradient of frequency content with respect to angle.  The dolphin sonar appears to be exploiting a similar principle, only the different colors are different frequencies or pitch in the sound.

Prior studies on bottlenose dolphins assumed the animal pointed its beam directly at the target, but this assumption resulted in the conclusion that the animals shouldn’t be as “good” at echolocation as they actually are. What if, instead, they use a different strategy? We hypothesized that the dolphin might be aiming their sonar so that the main axis of the beam passes next to the target, which results in the region of maximum gradient falling on the target. Our model predicts that placing the region of the beam most sensitive to change on the target will give the dolphin greatest precision in locating the object.

To test our hypothesis, we trained a bottlenose dolphin to detect the presence or absence of an aluminum cylinder while we recorded the echolocation signals with a 16-element hydrophone array (Fig.2).

Laura Dolphin Graphics


Figure 2: Experimental setup. The dolphin detected the presence or absence of cylinders at different distances while we recorded sonar beam aim with a hydrophone array.

We then measured where the dolphin directed its sonar beam in relation to the target and found the dolphin pointed its sonar beam 7.05 ± 2.88 degrees (n=1930) away from the target (Fig.3).




Figure 3: Optimality in directing beam away from axis. The numbers on the emitted beam represent the attenuation in decibels relative to the sound emitted from the dolphin. The high frequency beam (red) is narrower than the blue and attenuates at angle more rapidly. The dolphin directs its sonar beam 7 degrees away from the target.

To then determine if certain regions of the sonar beam provide more theoretical “information” to the dolphin, which would improve its echolocation, we applied information theory to the dolphin sonar beam. Using the weighted frequencies present in the signal, we calculated the Fisher Information for the emitted beam of a bottlenose dolphin. From our calculations we determined 95% of the maximum Fisher Information to be between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees (Fig. 4).




Figure 4: The calculated Fisher Information as a function of bearing angle. The peak of the information is between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees.

The result? The dolphin is using a strategy that is the mathematically optimal! By directing its sonar beam slightly askew of the target (such as a fish), the target is placed in the highest frequency gradient of the beam, allowing the dolphin to locate the target more precisely.

2pAAa10 – Turn around when you’re talking to me! – Jennifer Whiting, Timothy Leishman, PhD, K.J. Bodon

Turn around when you’re talking to me!

Jennifer Whiting – jkwhiting@physics.byu.edu

Timothy Leishman, PhD – tim_leishman@physics.byu.edu

K.J. Bodon – joshuabodon@gmail.com

Brigham Young University

N283 Eyring Science Center

Provo, UT 84602


Popular version of paper 2pAAa10, “High-resolution measurements of speech directivity”

Presented Tuesday afternoon, November 3, 2015, 4:40 PM, Grand Ballroom 3

170th ASA Meeting, Jacksonville


In general, most sources of sound do not radiate equally in all directions. The human voice is no exception to this rule. How strongly sound is radiated in a given direction at a specific frequency, or pitch, is called directivity. While many [references] have studied the directivity of speaking and singing voices, some important details are missing. The research reported in this presentation measured directivity of live speech at higher angular and frequency resolutions than have been previously measured, in an effort to capture the missing details.

Measurement methods

The approach uses a semicircular array of 37 microphones spaced with five-degree polar-angle increments, see Figure 1. A subject sits on a computer-controlled rotating chair with his or her mouth aligned at the axis of rotation and circular center of the microphone array. He or she repeats a series of phonetically-balanced sentences at each of 72 five-degree azimuthal-angle increments. This results in 2522 measurement points on a sphere around the subject.


[Figure 1. A subject and the measurement array]


The measurements are based on audio recordings of the subject who tries to repeat the sentences with exactly the same timing and inflection at each rotation. To account for the inevitable differences in each repetition, a transfer function and the coherence between a reference microphone near the subject and a measurement microphone on the semicircular array is computed. The coherence is used to examine how good each measurement is. The transfer function for each measurement point makes up the directivity. To visualize the results, each measurement is plotted on a sphere, where the color and the radius of the sphere indicate how strongly sound is radiated in that direction for a given frequency. Animations of these spherical plots show how the directivity differs for each frequency.

[Figure 2. Balloon plot for male speech directivity at 500 and 1000 Hz.]

[Figure 3. Balloon plot for female speech directivity at 500 and 1000 Hz.]


[Animation 1. Male Speech Directivity, animated]

[Animation 2. Female Speech Directivity, animated]

Results and Conclusions

Some unique results are visible in the animations. Most importantly, as frequency increases, one can see that most of the sound is radiated in the forward direction. This is one reason for why it’s hard to hear someone talking in the front of a car when you’re sitting in the back, unless they turn around to talk to you. One can also see in the animations that as frequency increases, and most of the sound radiates forwards, there is poor coherence in the back area. This doesn’t necessarily indicate a poor measurement, just poor signal-to-noise ratio, since there is little sound energy in that direction. It’s also interesting to see that the polar angle of the strongest radiation also changes with frequency. At some frequencies the sound is radiated strongly downward and to the sides, but at other frequencies the stound is radiated strongly upwards and forwards. Male and female directivities are similar in shape, but at different frequencies, since the fundamental frequency of males and females is so different.

A more complete understanding of speech directivity has great benefits to several industries. For example, hearing aid companies can use speech directivity patterns to know where to aim microphones in the hearing aids to pick up the best sound for the hearing aid wearer having a conversation. Microphone placement in cell phones can be adjusted to get clearer signal from those talking into the cell phone. The theater and audio industries can use directivity patterns to assist in positioning actors on stage, or placing microphones near the speakers to record the most spectrally rich speech. The scientific community can develop more complete models for human speech based on these measurements. Further study on this subject will allow researchers to improve the measurement method and analysis techniques to more fully understand the results, and generalize them to all speech containing similar phonemes to those in these measurements.

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages – Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian

Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
SUNY at Binghamton
Binghamton, NY


Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).




(b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples


Sadeghian 2_graphs

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.



4pMU5 – “Evolution of the piano” – Nicholas Giordano

“Evolution of the piano”

Nicholas Giordano – nig003@auburn.edu

Auburn University

Auburn, AL


Popular version of paper 4pMU5 – “Evolution of the piano” – Nicholas Giordano

Presented Thursday afternoon, November 5, 2:25 PM, Grand Ballroom 2

170th ASA Meeting, Jacksonville, Fl



The piano was invented 300 years ago by Bartolomeo Cristofori, who in his “day job” was responsible for the instruments owned by the famous Medici family in Florence, Italy. Many of those instruments were harpsichords, and the first pianos were very similar to a harpsichord with one crucial difference. In a harpsichord the strings are set into motion by plucking (as in a guitar) and the amplitude of a pluck is independent of how forcefully a key is pressed.  In a piano the strings are struck with a hammer and Cristofori invented a clever mechanism (called the piano “action”) through which the speed of the hammer and hence the volume of a tone is controlled by the force with which a key is pressed. In this way a piano player can vary the loudness of notes individually, something that was not possible with the harpsichord or organ, the dominant keyboard instruments of the day. This gave the piano new expressive capabilities which were soon exploited by composers such as Mozart and Beethoven.

Figure 1 shows one of the three existing Cristofori pianos. It is composed almost entirely of wood (except for the strings) and has a range of 4 octaves – 49 notes. It has 98 strings (two for each note), each held at a tension of about 60 Newtons (around 13 lbs), and is light enough that two adults can easily lift it. A typical modern piano is shown in Figure 2. It has a range of 7-1/3 octaves – 88 notes – and more than 200 strings (most notes have three strings), each held at a tension of around 600 Newtons. This instrument weighs almost 600 lbs.



Figure 1 caption. Piano built by Bartolomeo Cristofori in 1722. This piano is in the Museo Nationale degli Strumenti Musicali in Rome. Image from Wikimedia Commons (wikimedia.org/wikipedia/commons/3/32/Piano_forte_Cristofori_1722.JPG). The other pianos made by Cristofori and still in existence are in the Metropolitan Museum of Art in New York City and the Musikinstrumenten-Museum in Leipzig.


Figure 2 caption. A typical modern piano. This is a Steinway model M that belongs to the author. Photo by Lizz Giordano.


My conference paper considers how the piano in Figure 1 evolved into the instrument in Figure 2. As is described in the paper, this evolution was driven by a combination of factors including the capabilities and limitations of the human auditory system, the demands of composers ranging from Mozart to Beethoven to Rachmaninoff, and developments in technology such as the availability of the high strength steel wire that is now used for the strings.


How many notes?

The modern piano has nearly twice as many notes as the pianos of Cristofori. These additional notes were added gradually over time. Most of the keyboard music of J. S. Bach can be played on the 49 notes of the first pianos, but composers soon wanted more. By Mozart’s time in the late 1700s, most pianos had 61 notes (a five octave range). They expanded to 73 notes (six octaves) for Beethoven in the early 1800s, and eventually to the 88 notes we have today by about 1860. The frequency range covered by these notes extends from around 25 Hz to just over 4000 Hz. The human ear is sensitive to a much wider range so one might ask “why don’t we have even more notes?” The answer seems to lie in the way we hear tones with frequencies that are much outside the piano range. Tones with frequencies below the piano range are heard by most people as clicks [1], and such tones would not be useful for most kinds of music. Tones with frequencies much above the high end of the piano range pose a different problem. In much music two or more tones are played simultaneously to produce chords and similar combinations. It turns out that our auditory system is not able to perceive such “chordal” relationships for tones much above the piano range [1]. Hence, these tones cannot be used by a composer to form the chords and other note combinations that are an essential part of western music. The range of notes found in a piano is thus determined by the human auditory system – this is why the number of notes found in a piano has not increased beyond the limits reached about 150 years ago.


Improving the strings

The piano strings in Cristofori’s piano were thin (less than 1 mm in diameter) and composed of brass or iron. They were held at tensions of about 60 N, which was probably a bit more than half their breaking tensions, providing a margin of safety. An increase in tension allows the string to be hit harder with the hammer, producing a louder sound. Hence, as the piano came to be used more and more as a solo instrument and as concert halls grew in size, piano makers needed to incorporate stronger strings. These improved strings were generally composed of iron with controlled amounts of impurities such as carbon. The string tensions used in piano design thus increased by about a factor of 10 from the earliest pianos to around 1860 at which time steel piano wire was available. Steel wire continues to be used in modern pianos, but the strength of modern steel wire is not much greater than the wire available in 1860, so this aspect of piano design has not changed substantially since that time.


Making a stronger case

The increased number of strings in a modern piano combined with the greater string tension results in much larger forces, by about a factor of 20, on the case of a modern instrument as compared to the Cristofori piano. The case of an early piano was made from wood but the limits of a wooden case were reached by the early 1800s in the pianos that Beethoven encountered. To cope with this problem, piano makers then added metal rods and later plates to strengthen the case, leading to what is now called a “full metal plate.” The plate is now composed of iron (steel is not required since iron under compression is quite strong and stable) and is visible in Figure 2 as the gold colored plate that extends from the front to the back of the instrument. Some piano makers objected to adding metal to the piano, arguing that it would give the tone a “metallic” sound. They were evidently able to overlook the fact that the strings were already metal. Interestingly, the full metal plate was the first important contribution to piano design by an American, as it was introduced in the mid-1820s by Alphaeus Babcock.


Making a piano hammer

As the string tension increased it was also necessary to redesign the piano hammer. In most early pianos the hammer was fairly light (about 1 g or less), with a layer of leather glued over a wooden core. As the string tension grew a more durable covering was needed, and leather was replaced by felt in the mid-1800s. This change was made possible by improvements in the technology of making felt with a high and reproducible density. The mass of the hammer also increased; in a modern piano the hammers for the bass (lowest) notes have a mass more than 10 times greater than in Cristofori’s instruments.


How has the sound changed?

We have described how the strings, case, hammers, and range of the piano have changed considerably since Cristofori invented the instrument, and there have been many other changes as well. It is thus not surprising that the sounds produced by an early piano can be distinguished from those of a modern piano. However, the tones of these instruments are remarkable similar – even the casual listener will recognize both as coming from a “piano.” While there are many ways to judge and describe a piano tone, the properties of the hammers are, in the opinion of the author (an amateur pianist), most responsible for the differences in the tones of early and modern pianos. The collision between the hammer and string have a profound effect on the tone, and the difference in the hammer covering (leather versus felt) makes the tone of an early piano sound more “percussive” and “pluck-like” than that of a modern piano. This difference can be heard in sound examples that accompany this article.


The future of the piano

While the piano is now 300 years old, its evolution from Cristofori’s first instruments to the modern piano was complete by the mid-1800s. Why has the piano remained unchanged for the past 150 years? We have seen that much of the evolution was driven by improvements in technology such as the availability of steel wire that is now used for the strings. Modern steel wire is not much different than that available more than a century ago, but other string materials are now available. For example, wires made of carbon fibers can be stronger than steel and would seem to have advantages as piano strings [2], but this possibility has not (yet) been explored in more than a theoretical way. Indeed, the great success of the piano has made piano makers, players, and listeners resistant to major changes. While new technologies or designs will probably be incorporated into the pianos of the future, it seems likely that it will always sound much like the instrument we have today.


The evolution of the piano is described in more detail in an article by the author that will appear in Acoustics Today later this year. Much longer and more in-depth versions of this story can be found in Refs. 3 and 4.


[1] C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper (2005). Pitch: Neural Coding and Perception (Springer), Chapter 2.

[2] N. Giordano (2011). “Evolution of music wire and its impact on the development of the piano,” Proceedings of Meetings on Acoustics 12, 035002.

[3] E. M. Good (2002). Giraffes, Black Dragons, and Other Pianos, 2nd edition (Stanford University Press).

[4] N. J. Giordano (2010). Physics of the Piano (Oxford University Press).



Sound examples

Both of these audio examples are the beginning of the first movement of Mozart’s piano sonata in C major, K. 545. The first one is played with a piano that is a copy of an instrument like the ones Mozart played. The second audio example was played with a modern piano.


(1) Early piano. Played by Malcom Bilson in a copy of a c. 1790 piano made by Paul McNulty (CD: Hungaroton Classic, Wolfgang Amadeus Mozart Sonatas Vol. III, Malcolm Bilson, fortepiano, HCD31013-14).


(2) Modern piano. Played by Daniel Barenboim on a modern (Steinway) piano (CD: EMI Classics, Mozart, The Piano Sonatas, Catalog #67294).


1pABb1 – Mice ultrasonic detection and localization in laboratory environment – Yegor Sinelnikov

Mice ultrasonic detection and localization in laboratory environment


Yegor Sinelnikov – yegor.sinelnikov@gmail.com
Alexander Sutin, Hady Salloum, Nikolay Sedunov, Alexander Sedunov
Stevens Institute of Technology
Hoboken, NJ 07030

Tom Zimmerman, Laurie Levine
DLAR Stony Brook University
Stony Brook, NY 11790

David Masters
Department of Homeland Security
Science and Technology Directorate
Washington, DC


Popular version of poster 1pABb1, “Mice ultrasonic detection and localization in laboratory environment”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 3
170th ASA Meeting, Jacksonville


A house mouse, mus musculus, historically shares the human environment without much permission. It lives in our homes, enjoys our husbandry, and passes through walls and administrative borders unnoticed and unaware of our wary attention. Over the thousands of years of coexistence, mice excelled in a carrot and stick approach. Likewise, an ordinary wild mouse brings both danger and cure to humans todays. A danger is in the form of rodent-borne diseases, amongst them plague epidemics, well remembered in European medieval history, continue to pose a threat to human health. A cure is in the form of lending themselves as research subjects for new therapeutic agents, an airily misapprehension of genomic similarities, small size, and short life span. Moreover, physiological similarity in inner ear construction, brain auditory responses and unexpected richness in vocal signaling attested to the tremendous interest to mice bioacoustics and emotion perception.

The goal of this work is to start addressing possible threats reportedly carried by invasive species crossing US borders unnoticed in multiple cargo containers. This study focuses on demonstrating the feasibility of acoustic detection of potential rodent intrusions.

Animals communicate with smell, touch, movement, visual signaling and sound. Mice came well versed in sensorial abilities to face the challenge of sharing habitat with humans. Mice gave up color vision, developed exceptional stereoscopic smell, and learned to be deceptively quiet in human auditory range, discretely shifting their social acoustic interaction to higher frequencies. They predominantly use ultrasonic frequencies above the human hearing range as a part of their day-to-day non aggressive social interaction. Intricate ultrasonic mice songs composed of multiple syllable sounds often constituting complex phrases separated by periods of silence are well known to researchers.

In this study, mice sounds were recorded in a laboratory environment at an animal facility at Stony Brook University Hospital. The mice were allowed to move freely, a major condition for their vocalization in ultrasonic range. Confined to cages, mice did not produce ultrasonic signals. Four different microphones with flat ultrasonic frequency response were positioned in various arrangements and distances from the subjects. The distances varied from a few centimeters to several meters. An exemplary setup is shown in Figure 1. Three microphones, sensitive in the frequency range between 20 kHz and 100 kHz, were connected to preamplifiers via digital converters to a computer equipped with dedicated sound recording software. The fourth calibrated microphone was used for measurements of absolute sound level produced by a mouse. The spectrograms were monitored by an operator in real time to detect the onset of mice communications and simplify line data processing.


Sinenikov fig 1

Figure 1. Setup of experiment showing the three microphones (a) on a table with unrestrained mouse (b),  recording equipment preamplifiers and digitizers (c) and computer (d).
Listen to a single motif of mice ultrasonic vocalization and observe mouse movement here:

This sound fragment was down converted (slowed down) fifteen times to be audible. In reality, mice social songs are well above the human audible range and are very fast. The spectrograms of mice vocalization at distances of 1 m and 5 m are shown in Figure 2. Mice vocalization was detectable at 5 m and retained recognizable vocalization pattern. Farther distances were not tested due to the limitation of the room size.

The real time detection of mice vocalization required detection of the fast, noise insensitive and automated algorithm. An innovative approach was required. Recognizing that no animal communication comes close to become a language, the richness and diversity of mice ultrasonic vocalization prompted us to apply speech processing measures for their real time detection. A number of generic speech processing measures such temporal signal to noise ratio, cepstral distance, and likelihood ratio were tested for the detection of mice vocalization events in the presence of background noise.  These measures were calculated from acoustical measurements and compared with conventional techniques, such as bandpass filtering, spectral power, or continuous monitoring of signal frames for the presence of expected tones.



Figure 2. Sonograms of short ultrasonic vocalization syllables produced by mice at 1 m (left) and 5 m (right) distances from microphones.  The color scale is in the decibels.

Although speech processing measures were invented to assess human speech intelligibility, we found them applicable for the acoustic mice detection within few meters. Leaving aside the question about mice vocalization intelligibly, we concluded that selected speech processing measures enabled us to detect events of mice vocalization better than other generic signal processing techniques.

As a secondary goal of this study, upon successful acoustic detection, the mice vocalization needed to be processed to determine animal location. It was of main interest for border patrol applications, where both acoustic detection and spatial localization are critical, and because mice movement has a behavioral specificity. To prove the localization feasibility, detected vocalization events from each microphone pair were processed to determine the time difference of arrival (TDOA). The analysis was limited to nearby locations by relatively short cabling system. Because the animals were moving freely on the surface of a laboratory table, roughly coplanar with microphones, the TDOA values were converted to the animal location using simple triangulation scheme. The process is illustrated schematically in Figure 3 for two selected microphones. Note that despite low signal to noise ratio for the microphone 2, the vocalization events were successfully detected. The cross correlograms, calculated in spectral domain with empirical normalization to suppress the effect of uncorrelated noise, yielded reliable TDOA. A simple check for the zero sum of TDOA was used as a consistency control. Calculated TDOA were converted into spatial locations, which were assessed for correctness, experimental and computational uncertainties and compared with available video recordings. Despite relatively high level of technogenic noise, the TDOA calculated locations agreed well with video recordings. The TDOA localization uncertainty was estimated on the order of the mouse size, roughly corresponding to several wavelengths at 50 kHz. A larger number of microphones is expected to improve detectability and enable more precise three dimensional localization.

Hence, mice ultrasonic socialization sounds are detectable by the application of speech processing techniques, their TDOA are identifiable by cross correlation and provide decent spatial localization of animals in agreement with video observations.



Figure 3. The localization process. First, the detected vocalization events from two microphones (left) are paired and their cross correlogram is calculated (middle). The maxima, marked by asterisks, define a set of identified TDOA.  The process is repeated for every pair of microphones. Second, the triangulation is performed (right). The colored hyperbolas illustrate possible locations of animal on a laboratory table based on calculated TDOA. Hyperbolas intersection provides the location of animal. The numbered squares mark the location of microphones.


1The constructed recording system is particularly important for the detection of mice in containers  at US ports of entry, where low frequency noises are high. This pilot study confirms the feasibility of using Stevens Institute’s ultrasonic recording system for simultaneous detection of mice vocalization and movement.

This work was funded by the U.S. Department of Homeland Security’s Science and Technology Directorate. The views and conclusions contained in this paper are those of the authors and should not necessarily be interpreted as representing the official policies, either expressed or implied of the U.S. Department of Homeland Security.