1aSC31 – Shape changing artificial ear inspired by bats enriches speech signals – Anupam K Gupta

1aSC31 – Shape changing artificial ear inspired by bats enriches speech signals – Anupam K Gupta

Shape changing artificial ear inspired by bats enriches speech signals

Anupam K Gupta1,2 , Jin-Ping Han ,2, Philip Caspers1, Xiaodong Cui2, Rolf Müller1

1 Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
2 IBM T. J. Watson Research Center, Yorktown, NY, USA

Contact: Jin-Ping Han – hanjp@us.ibm.com

Popular version of paper 1aSC31, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals.”
Presented Monday morning, Novemeber 28, 2016
172nd ASA Meeting, Honolulu


Have you ever had difficulty understanding what someone was saying to you while walking down a busy big city street, or in a crowded restaurant? Even if that person was right next to you? Words can become difficult to make out when they get jumbled with the ambient noise – cars honking, other voices – making it hard for our ears to pick up what we want to hear. But this is not so for bats. Their ears can move and change shape to precisely pick out specific sounds in their environment.

This biosonar capability inspired our artificial ear research and improving the accuracy of automatic speech recognition (ASR) systems and speaker localization. We asked if could we enrich a speech signal with direction-dependent, dynamic features by using bat-inspired reception dynamics?

Horseshoe bats, for example, are found throughout Africa, Europe and Asia, and so-named for the shape of their noses, can change the shape of their outer ears to help extract additional information about the environment from incoming ultrasonic echoes. Their sophisticated biosonar systems emit ultrasonic pulses and listen to the incoming echoes that reflect back after hitting surrounding objects by changing their ear shape (something other mammals cannot do). This allows them to learn about the environment, helping them navigate and hunt in their home of dense forests.

While probing the environment, horseshoe bats change their ear shape to modulate the incoming echoes, increasing the information content embedded in the echoes. We believe that this shape change is one of the reasons bats’ sonar exhibit such high performance compared to technical sonar systems of similar size.

To test this, we first built a robotic bat head that mimics the ear shape changes we observed in horseshoe bats.


Figure 1: Horseshoe bat inspired robotic set-up used to record speech signal



We then recorded speech signals to explore if using shape change, inspired by the bats, could embed direction-dependent dynamic features into speech signals. The potential applications of this could range from improving hearing aid accuracy to helping a machine more-accurately hear – and learn from – sounds in real-world environments.
We compiled a digital dataset of 11 US English speakers from open source speech collections provided by Carnegie Mellon University. The human acoustic utterances were shifted to the ultrasonic domain so our robot could understand and play back the sounds into microphones, while the biomimetic bat head actively moved its ears. The signals at the base of the ears were then translated back to the speech domain to extract the original signal.
This pilot study, performed at IBM Research in collaboration with Virginia Tech, showed that the ear shape change was, in fact, able to significantly modulate the signal and concluded that these changes, like in horseshoe bats, embed dynamic patterns into speech signals.

The dynamically enriched data we explored improved the accuracy of speech recognition. Compared to a traditional system for hearing and recognizing speech in noisy environments, adding structural movement to a complex outer shape surrounding a microphone, mimicking an ear, significantly improved its performance and access to directional information. In the future, this might improve performance in devices operating in difficult hearing scenarios like a busy street in a metropolitan center.


Figure 2: Example of speech signal recorded without and with the dynamic ear. Top row: speech signal without the dynamic ear, Bottom row: speech signal with the dynamic ear



2aABa3 – Indris’ melodies are individually distinctive and genetically driven – Marco Gamba

2aABa3 – Indris’ melodies are individually distinctive and genetically driven – Marco Gamba

Indris’ melodies are individually distinctive and genetically driven

Marco Gamba – marco.gamba@unito.it
Cristina Giacoma – cristina.giacoma@unito.it

University of Torino
Department of Life Sciences and Systems Biology
Via Accademia Albertina 13
10123 Torino, Italy

Popular version of paper 2aABa3 “Melody in my head, melody in my genes? Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu


Melody in my head, melody in my genes?
Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar


Human hearing ablities are exceptional at identifying the voices of friends and relatives [1]. The potential for this identification lies in the acoustic structures of our words, which not only convey verbal information (the meaning of our words) but also non-verbal cues (such as sex and identity of the speakers).

In animal communication, the recognizing a member of the same species can also be important. Birds and mammals may adjust their signals that function for neighbor recognition, and the discrimination between a known neighbor and a stranger would result in strikingly different responses in term of territorial defense [2].

Indris (Indri indri) are the only lemurs that produce group songs and among the few primate species that communicate using articulated singing displays. The most distinctive portions of the indris’ song are called descending phrases, consisting of between two and five units or notes. We recorded 21 groups of indris in the Eastern rainforests of Madagascar from 2005 to 2015. In each recording, we identified individuals using natural markings. We noticed that group encounters were rare, and hypothesized that song might play a role in providing members of the same species with information about the sex and identity of an individual singer and the emitting group.




We found we could effectively discriminate between the descending phrases of an individual indris, showing they have the potential for advertising about sex and individual identity. This strengthened the hypothesis that song may play a role in processes like kinship and mate recognition. Finding that there is was degree of group specificity in the song also supports the idea that neighbor-stranger recognition is also important in the indris and that the song may function announcing territorial occupation and spacing.




Traditionally, primate songs are considered an example of a genetically determined display. Thus the following step in our research was to examine whether the structure of the phrases could relate to the genetic relatedness of the indris. We found a significant correlation between the genetic relatedness of the studied individuals and the acoustic similarity of their song phrases. This suggested that genetic relatedness may play a role in determining song similarity.

For the first time, we found evidence that the similarity of a primate vocal display changes within a population in a way that is strongly associated with kin. When examining differences between sexes we found that male offspring showed phrases that were more similar to their fathers, while daughters did not show similarity with any of their parents.




The potential for kin detection may play a vital role in determining relationships within a population, regulating dispersal, and avoiding inbreeding. Singing displays may advertise kin to signal against potential mating, information that females, and to a lesser degree males, can use when forming a new group. Unfortunately, we still do not know whether indris can perceptually decode this information or how they use it in their everyday life. But work like this sets the basis for understanding primates’ mating and social systems and lays the foundation for better conservation methods.


  1. Belin, P. Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 2006. 361: p. 2091-2107.
  2. Randall, J. A. Discrimination of foot drumming signatures by kangaroo rats, Dipodomys spectabilis. Animal Behaviour, 1994. 47: p. 45-54.
  3. Gamba, M., Torti, V., Estienne, V., Randrianarison, R. M., Valente, D., Rovara, P., Giacoma, C. The Indris Have Got Rhythm! Timing and Pitch Variation of a Primate Song Examined between Sexes and Age Classes. Frontiers in Neuroscience, 2016. 10: p. 249.
  4. Torti, V., Gamba, M., Rabemananjara, Z. H., Giacoma, C. The songs of the indris (Mammalia: Primates: Indridae): contextual variation in the long-distance calls of a lemur. Italian Journal of Zoology, 2013. 80, 4.
  5. Barelli, C., Mundry, R., Heistermann, M., Hammerschmidt, K. Cues to androgen and quality in male gibbon songs. PLoS ONE, 2013. 8: e82748.


Figure legends.


Figure 1. A female indri with offspring in the Maromizaha Forest, Madagascar. Maromizaha is a New Protected Area located in the Region Alaotra-Mangoro, east of Madagascar. It is managed by GERP (Primate Studies and Research Group). At least 13 species of lemurs have been observed in the area.

Figure 2. Spectrograms of an indri song showing a typical sequence of different units. In the enlarged area, the pitch contour in red shows a typical “descending phrase” of 4 units. The indris also emit phrases of 2, 3 and more rarely 5 or 6 units.

Figure 3. A 3d-plot of the dimensions (DF1, DF2, DF3) generated from a Discriminant model that successfully assigned descending phrases of four units (DP4) to the emitter. Colours denote individuals. The descending phrases of two (DP2) and three units (DP3) also showed a percentage of correct classification rate significantly above chance.



3aAB7 – Construction Noise Impact on Wild Birds – Pasquale Bottalico, PhD.

3aAB7 – Construction Noise Impact on Wild Birds – Pasquale Bottalico, PhD.

Construction Noise Impact on Wild Birds

Pasquale Bottalico, PhD. – pb@msu.edu

Voice Biomechanics and Acoustics Laboratory
Department of Communicative Sciences and Disorders
College of Communication Arts & Sciences
Michigan State University
1026 Red Cedar Road
East Lansing, MI 48824


Popular version of paper 3aAB7, “Construction noise impact on wild birds”
Presented Tuesday morning, May 25, 2016, 10:20, Salon I
171st ASA Meeting, Salt Lake City



Almost all bird species use acoustic signals to communicate or recognize biological signals – to mate, to detect the sounds of predators and/or prey, to perform mate selection, to defend their territory, and to perform social activities. Noise generated from human activities (in particular by infrastructure and construction sites) has a strong impact on the physiology and behaviour of birds. In this work, a quantitative method for evaluating the impact of noise on wild birds is proposed. The method combines the results of previous studies that considered the effect of noise on birds and involved noise mapping evaluations. A forecast noise simulation was used to generate maps of (1) masking-annoyance areas and (2) potential density variation.

An example of application of the masking-annoyance areas method is shown in Figure 1. If a bird is in the Zone 1 (in purple), traffic noise and construction noise can potentially result in hearing loss and threshold shift. A temporary elevation of the bird’s hearing threshold and a masking of important communication signals can occur in the Zone 2 (in red). Zone 3 (in orange), 4 (in yellow) and 5 (in light green) are characterized by a high, medium and low level of signal masking, respectively. Once the level of noise generated by human activities falls below ambient noise levels in the critical frequencies for communication (2–8 kHz), masking of communication signals is no longer an issue. However, low-frequency noise, such as the rumble of a truck, may still potentially cause other behavioural and/or physiological effects (Zone 6, in green). No effects of any kind occur on the birds in Zone 7 (in dark green). The roles for Zone definition are based on the results of Dooling and Popper. [1]


Figure 1 Mapping of the interaction areas of noise effect on birds within the 7 zones for a project without (a) and with mitigations (b).

Figure 1 Mapping of the interaction areas of noise effect on birds within the 7 zones for a project without (a) and with mitigations (b).

Waterman et al. [2] and Reijnem et al. [3-4-5] proposed a trend of the potential variation in birds density in relationship with the noise levels present in the area. This trend shows no effect on density when the noise levels are lower than 45 dB(A), while there is a rapid decrease (with a quadratic shape) for higher levels. An example of the potential decrease in bird density for a project with and without mitigations is shown in Figure 2. The blue areas are the areas where the birds’ density is not influenced by the noise, while the red ones are the areas from where the birds are leaving because the noise levels are too high.

This methodology permits a localization of the areas with greater impacts on birds. The mitigation interventions should be focused on these areas in order to balance bird habitat conservation and human use of land.


Figure 2 Potential decrease in bird density for a project without (a) and with mitigations (b).


Figure 2 Potential decrease in bird density for a project without (a) and with mitigations (b).



  1. R. J. Dooling and A. N. Popper, The effects of highway noise on birds, Report prepared for The California Department of Transportation Division of Environmental Analysis, (2007).
  2. E. Waterman, I. Tulp, R. Reijnen, K. Krijgsveld and C. ter Braak, “Noise disturbance of meadow birds by railway noise”, Inter-Noise2004, (2004).
  3. R. Reijnen and R. Foppen, “The effects of car traffic on breeding bird populations in woodland. IV. Influence of population size on the reduction of density close to the highway”, J. Appl. Ecol. 32(3), 481-491, (1995).
  4. R. Reijnen, R. Foppen, C. ter Braak and J. Thissen, “The effects of car traffic on breeding bird populations in Woodland. III. Reduction of density in relation to the proximity of main roads”, J. Appl. Ecol. 32(1), 187-202, (1995).
  5. R. Reijnen, G. Veenbaas and R. Foppen, Predicting the Effects of Motorway Traffic on Breeding Bird Populations. Ministry of Transport and Public Works, Delft, Netherlands, (1995).


3aUW8 – A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target – Laura N. Kloepper

3aUW8 – A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target – Laura N. Kloepper

A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target

Laura N. Kloepper– lkloepper@saintmarys.edu
Saint Mary’s College
Notre Dame, IN 46556


Yang Liu–yang.liu@umassd.edu
John R. Buck– jbuck@umassd.edu
University of Massachusetts Dartmouth
285 Old Westport Road
Dartmouth, MA 02747


Paul E. Nachtigall–nachtiga@hawaii.edu
University of Hawaii at Manoa
PO Box 1346
Kaneohe, HI 96744


Popular version of paper 3aUW8, “Bottlenose dolphins direct sonar clicks off-axis of targets to maximize Fisher Information about target bearing”

Presented Wednesday morning, November 4, 2015, 10:25 AM in River Terrace 2

170th ASA Meeting, Jacksonville


Bottlenose dolphins are incredible echolocators. Using just sound, they can detect a ping-pong ball sized object from 100 m away, and discriminate between objects differing in thickness by less than 1 mm. Based on what we know about man-made sonar, however, the dolphins’ sonar abilities are an enigma–simply put, they shouldn’t be as good at echolocation as they actually are.

Typical manmade sonar devices achi­eve high levels of performance by using very narrow sonar beams. Creating narrow beams requires large and costly equipment. In contrast to these manmade sonars, bottlenose dolphins achieve the same levels of performance with a sonar beam that is many times wider–but how? Understanding their “sonar secret” can help lead to more sophisticated synthetic sonar devices.

Bottlenose dolphins’ echolocation signals contain a wide range of frequencies.  The higher frequencies propagate away from the dolphin in a narrower beam than the low frequencies do. This means the emitted sonar beam of the dolphin is frequency-dependent.  Objects directly in front of the animal echo back all of the frequencies.   However, as we move out of the direct line in front of the animal, there is less and less high frequency, and when the target is way off to the side, only the lower frequencies reach the target to bounce back.   As shown below in Fig. 1, an object 30 degrees off the sonar beam axis has lost most of the frequencies.




Figure 1. Beam pattern and normalized amplitude as a function of signal frequency and bearing angle. At 0 degrees, or on-axis, the beam contains an equal representation across all frequencies. As the bearing angle deviates from 0, however, the higher frequency components fall off rapidly.

Consider an analogy to light shining through a prism.  White light entering the prism contains every frequency, but the light leaving the prism at different angles contains different colors.  If we moved a mirror to different angles along the light beam, it would change the color reflected as it moved through different regions of the transmitted beam.  If we were very good, we could locate the mirror precisely in angle based on the color reflected.  If the color changes more rapidly with angle in one region of the beam, we would be most sensitive to small changes in position at that angle, since small changes in position would create large changes in color.  In mathematical terms, this region of maximum change would have the largest gradient of frequency content with respect to angle.  The dolphin sonar appears to be exploiting a similar principle, only the different colors are different frequencies or pitch in the sound.

Prior studies on bottlenose dolphins assumed the animal pointed its beam directly at the target, but this assumption resulted in the conclusion that the animals shouldn’t be as “good” at echolocation as they actually are. What if, instead, they use a different strategy? We hypothesized that the dolphin might be aiming their sonar so that the main axis of the beam passes next to the target, which results in the region of maximum gradient falling on the target. Our model predicts that placing the region of the beam most sensitive to change on the target will give the dolphin greatest precision in locating the object.

To test our hypothesis, we trained a bottlenose dolphin to detect the presence or absence of an aluminum cylinder while we recorded the echolocation signals with a 16-element hydrophone array (Fig.2).

Laura Dolphin Graphics


Figure 2: Experimental setup. The dolphin detected the presence or absence of cylinders at different distances while we recorded sonar beam aim with a hydrophone array.

We then measured where the dolphin directed its sonar beam in relation to the target and found the dolphin pointed its sonar beam 7.05 ± 2.88 degrees (n=1930) away from the target (Fig.3).




Figure 3: Optimality in directing beam away from axis. The numbers on the emitted beam represent the attenuation in decibels relative to the sound emitted from the dolphin. The high frequency beam (red) is narrower than the blue and attenuates at angle more rapidly. The dolphin directs its sonar beam 7 degrees away from the target.

To then determine if certain regions of the sonar beam provide more theoretical “information” to the dolphin, which would improve its echolocation, we applied information theory to the dolphin sonar beam. Using the weighted frequencies present in the signal, we calculated the Fisher Information for the emitted beam of a bottlenose dolphin. From our calculations we determined 95% of the maximum Fisher Information to be between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees (Fig. 4).




Figure 4: The calculated Fisher Information as a function of bearing angle. The peak of the information is between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees.

The result? The dolphin is using a strategy that is the mathematically optimal! By directing its sonar beam slightly askew of the target (such as a fish), the target is placed in the highest frequency gradient of the beam, allowing the dolphin to locate the target more precisely.

1pABb1 – Mice ultrasonic detection and localization in laboratory environment – Yegor Sinelnikov

Mice ultrasonic detection and localization in laboratory environment


Yegor Sinelnikov – yegor.sinelnikov@gmail.com
Alexander Sutin, Hady Salloum, Nikolay Sedunov, Alexander Sedunov
Stevens Institute of Technology
Hoboken, NJ 07030

Tom Zimmerman, Laurie Levine
DLAR Stony Brook University
Stony Brook, NY 11790

David Masters
Department of Homeland Security
Science and Technology Directorate
Washington, DC


Popular version of poster 1pABb1, “Mice ultrasonic detection and localization in laboratory environment”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 3
170th ASA Meeting, Jacksonville


A house mouse, mus musculus, historically shares the human environment without much permission. It lives in our homes, enjoys our husbandry, and passes through walls and administrative borders unnoticed and unaware of our wary attention. Over the thousands of years of coexistence, mice excelled in a carrot and stick approach. Likewise, an ordinary wild mouse brings both danger and cure to humans todays. A danger is in the form of rodent-borne diseases, amongst them plague epidemics, well remembered in European medieval history, continue to pose a threat to human health. A cure is in the form of lending themselves as research subjects for new therapeutic agents, an airily misapprehension of genomic similarities, small size, and short life span. Moreover, physiological similarity in inner ear construction, brain auditory responses and unexpected richness in vocal signaling attested to the tremendous interest to mice bioacoustics and emotion perception.

The goal of this work is to start addressing possible threats reportedly carried by invasive species crossing US borders unnoticed in multiple cargo containers. This study focuses on demonstrating the feasibility of acoustic detection of potential rodent intrusions.

Animals communicate with smell, touch, movement, visual signaling and sound. Mice came well versed in sensorial abilities to face the challenge of sharing habitat with humans. Mice gave up color vision, developed exceptional stereoscopic smell, and learned to be deceptively quiet in human auditory range, discretely shifting their social acoustic interaction to higher frequencies. They predominantly use ultrasonic frequencies above the human hearing range as a part of their day-to-day non aggressive social interaction. Intricate ultrasonic mice songs composed of multiple syllable sounds often constituting complex phrases separated by periods of silence are well known to researchers.

In this study, mice sounds were recorded in a laboratory environment at an animal facility at Stony Brook University Hospital. The mice were allowed to move freely, a major condition for their vocalization in ultrasonic range. Confined to cages, mice did not produce ultrasonic signals. Four different microphones with flat ultrasonic frequency response were positioned in various arrangements and distances from the subjects. The distances varied from a few centimeters to several meters. An exemplary setup is shown in Figure 1. Three microphones, sensitive in the frequency range between 20 kHz and 100 kHz, were connected to preamplifiers via digital converters to a computer equipped with dedicated sound recording software. The fourth calibrated microphone was used for measurements of absolute sound level produced by a mouse. The spectrograms were monitored by an operator in real time to detect the onset of mice communications and simplify line data processing.


Sinenikov fig 1

Figure 1. Setup of experiment showing the three microphones (a) on a table with unrestrained mouse (b),  recording equipment preamplifiers and digitizers (c) and computer (d).
Listen to a single motif of mice ultrasonic vocalization and observe mouse movement here:

This sound fragment was down converted (slowed down) fifteen times to be audible. In reality, mice social songs are well above the human audible range and are very fast. The spectrograms of mice vocalization at distances of 1 m and 5 m are shown in Figure 2. Mice vocalization was detectable at 5 m and retained recognizable vocalization pattern. Farther distances were not tested due to the limitation of the room size.

The real time detection of mice vocalization required detection of the fast, noise insensitive and automated algorithm. An innovative approach was required. Recognizing that no animal communication comes close to become a language, the richness and diversity of mice ultrasonic vocalization prompted us to apply speech processing measures for their real time detection. A number of generic speech processing measures such temporal signal to noise ratio, cepstral distance, and likelihood ratio were tested for the detection of mice vocalization events in the presence of background noise.  These measures were calculated from acoustical measurements and compared with conventional techniques, such as bandpass filtering, spectral power, or continuous monitoring of signal frames for the presence of expected tones.



Figure 2. Sonograms of short ultrasonic vocalization syllables produced by mice at 1 m (left) and 5 m (right) distances from microphones.  The color scale is in the decibels.

Although speech processing measures were invented to assess human speech intelligibility, we found them applicable for the acoustic mice detection within few meters. Leaving aside the question about mice vocalization intelligibly, we concluded that selected speech processing measures enabled us to detect events of mice vocalization better than other generic signal processing techniques.

As a secondary goal of this study, upon successful acoustic detection, the mice vocalization needed to be processed to determine animal location. It was of main interest for border patrol applications, where both acoustic detection and spatial localization are critical, and because mice movement has a behavioral specificity. To prove the localization feasibility, detected vocalization events from each microphone pair were processed to determine the time difference of arrival (TDOA). The analysis was limited to nearby locations by relatively short cabling system. Because the animals were moving freely on the surface of a laboratory table, roughly coplanar with microphones, the TDOA values were converted to the animal location using simple triangulation scheme. The process is illustrated schematically in Figure 3 for two selected microphones. Note that despite low signal to noise ratio for the microphone 2, the vocalization events were successfully detected. The cross correlograms, calculated in spectral domain with empirical normalization to suppress the effect of uncorrelated noise, yielded reliable TDOA. A simple check for the zero sum of TDOA was used as a consistency control. Calculated TDOA were converted into spatial locations, which were assessed for correctness, experimental and computational uncertainties and compared with available video recordings. Despite relatively high level of technogenic noise, the TDOA calculated locations agreed well with video recordings. The TDOA localization uncertainty was estimated on the order of the mouse size, roughly corresponding to several wavelengths at 50 kHz. A larger number of microphones is expected to improve detectability and enable more precise three dimensional localization.

Hence, mice ultrasonic socialization sounds are detectable by the application of speech processing techniques, their TDOA are identifiable by cross correlation and provide decent spatial localization of animals in agreement with video observations.



Figure 3. The localization process. First, the detected vocalization events from two microphones (left) are paired and their cross correlogram is calculated (middle). The maxima, marked by asterisks, define a set of identified TDOA.  The process is repeated for every pair of microphones. Second, the triangulation is performed (right). The colored hyperbolas illustrate possible locations of animal on a laboratory table based on calculated TDOA. Hyperbolas intersection provides the location of animal. The numbered squares mark the location of microphones.


1The constructed recording system is particularly important for the detection of mice in containers  at US ports of entry, where low frequency noises are high. This pilot study confirms the feasibility of using Stevens Institute’s ultrasonic recording system for simultaneous detection of mice vocalization and movement.

This work was funded by the U.S. Department of Homeland Security’s Science and Technology Directorate. The views and conclusions contained in this paper are those of the authors and should not necessarily be interpreted as representing the official policies, either expressed or implied of the U.S. Department of Homeland Security.