Shape changing artificial ear inspired by bats enriches speech signals
Anupam K Gupta1,2 , Jin-Ping Han ,2, Philip Caspers1, Xiaodong Cui2, Rolf Müller1
1 Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
2 IBM T. J. Watson Research Center, Yorktown, NY, USA
Contact: Jin-Ping Han – email@example.com
Popular version of paper 1aSC31, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals.”
Presented Monday morning, Novemeber 28, 2016
172nd ASA Meeting, Honolulu
Have you ever had difficulty understanding what someone was saying to you while walking down a busy big city street, or in a crowded restaurant? Even if that person was right next to you? Words can become difficult to make out when they get jumbled with the ambient noise – cars honking, other voices – making it hard for our ears to pick up what we want to hear. But this is not so for bats. Their ears can move and change shape to precisely pick out specific sounds in their environment.
This biosonar capability inspired our artificial ear research and improving the accuracy of automatic speech recognition (ASR) systems and speaker localization. We asked if could we enrich a speech signal with direction-dependent, dynamic features by using bat-inspired reception dynamics?
Horseshoe bats, for example, are found throughout Africa, Europe and Asia, and so-named for the shape of their noses, can change the shape of their outer ears to help extract additional information about the environment from incoming ultrasonic echoes. Their sophisticated biosonar systems emit ultrasonic pulses and listen to the incoming echoes that reflect back after hitting surrounding objects by changing their ear shape (something other mammals cannot do). This allows them to learn about the environment, helping them navigate and hunt in their home of dense forests.
While probing the environment, horseshoe bats change their ear shape to modulate the incoming echoes, increasing the information content embedded in the echoes. We believe that this shape change is one of the reasons bats’ sonar exhibit such high performance compared to technical sonar systems of similar size.
To test this, we first built a robotic bat head that mimics the ear shape changes we observed in horseshoe bats.
Figure 1: Horseshoe bat inspired robotic set-up used to record speech signal
We then recorded speech signals to explore if using shape change, inspired by the bats, could embed direction-dependent dynamic features into speech signals. The potential applications of this could range from improving hearing aid accuracy to helping a machine more-accurately hear – and learn from – sounds in real-world environments.
We compiled a digital dataset of 11 US English speakers from open source speech collections provided by Carnegie Mellon University. The human acoustic utterances were shifted to the ultrasonic domain so our robot could understand and play back the sounds into microphones, while the biomimetic bat head actively moved its ears. The signals at the base of the ears were then translated back to the speech domain to extract the original signal.
This pilot study, performed at IBM Research in collaboration with Virginia Tech, showed that the ear shape change was, in fact, able to significantly modulate the signal and concluded that these changes, like in horseshoe bats, embed dynamic patterns into speech signals.
The dynamically enriched data we explored improved the accuracy of speech recognition. Compared to a traditional system for hearing and recognizing speech in noisy environments, adding structural movement to a complex outer shape surrounding a microphone, mimicking an ear, significantly improved its performance and access to directional information. In the future, this might improve performance in devices operating in difficult hearing scenarios like a busy street in a metropolitan center.
Figure 2: Example of speech signal recorded without and with the dynamic ear. Top row: speech signal without the dynamic ear, Bottom row: speech signal with the dynamic ear
Indris’ melodies are individually distinctive and genetically driven
Marco Gamba – firstname.lastname@example.org
Cristina Giacoma – email@example.com
University of Torino
Department of Life Sciences and Systems Biology
Via Accademia Albertina 13
10123 Torino, Italy
Popular version of paper 2aABa3 “Melody in my head, melody in my genes? Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu
Melody in my head, melody in my genes?
Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar
Human hearing ablities are exceptional at identifying the voices of friends and relatives . The potential for this identification lies in the acoustic structures of our words, which not only convey verbal information (the meaning of our words) but also non-verbal cues (such as sex and identity of the speakers).
In animal communication, the recognizing a member of the same species can also be important. Birds and mammals may adjust their signals that function for neighbor recognition, and the discrimination between a known neighbor and a stranger would result in strikingly different responses in term of territorial defense .
Indris (Indri indri) are the only lemurs that produce group songs and among the few primate species that communicate using articulated singing displays. The most distinctive portions of the indris’ song are called descending phrases, consisting of between two and five units or notes. We recorded 21 groups of indris in the Eastern rainforests of Madagascar from 2005 to 2015. In each recording, we identified individuals using natural markings. We noticed that group encounters were rare, and hypothesized that song might play a role in providing members of the same species with information about the sex and identity of an individual singer and the emitting group.
We found we could effectively discriminate between the descending phrases of an individual indris, showing they have the potential for advertising about sex and individual identity. This strengthened the hypothesis that song may play a role in processes like kinship and mate recognition. Finding that there is was degree of group specificity in the song also supports the idea that neighbor-stranger recognition is also important in the indris and that the song may function announcing territorial occupation and spacing.
Traditionally, primate songs are considered an example of a genetically determined display. Thus the following step in our research was to examine whether the structure of the phrases could relate to the genetic relatedness of the indris. We found a significant correlation between the genetic relatedness of the studied individuals and the acoustic similarity of their song phrases. This suggested that genetic relatedness may play a role in determining song similarity.
For the first time, we found evidence that the similarity of a primate vocal display changes within a population in a way that is strongly associated with kin. When examining differences between sexes we found that male offspring showed phrases that were more similar to their fathers, while daughters did not show similarity with any of their parents.
The potential for kin detection may play a vital role in determining relationships within a population, regulating dispersal, and avoiding inbreeding. Singing displays may advertise kin to signal against potential mating, information that females, and to a lesser degree males, can use when forming a new group. Unfortunately, we still do not know whether indris can perceptually decode this information or how they use it in their everyday life. But work like this sets the basis for understanding primates’ mating and social systems and lays the foundation for better conservation methods.
- Belin, P. Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 2006. 361: p. 2091-2107.
- Randall, J. A. Discrimination of foot drumming signatures by kangaroo rats, Dipodomys spectabilis. Animal Behaviour, 1994. 47: p. 45-54.
- Gamba, M., Torti, V., Estienne, V., Randrianarison, R. M., Valente, D., Rovara, P., Giacoma, C. The Indris Have Got Rhythm! Timing and Pitch Variation of a Primate Song Examined between Sexes and Age Classes. Frontiers in Neuroscience, 2016. 10: p. 249.
- Torti, V., Gamba, M., Rabemananjara, Z. H., Giacoma, C. The songs of the indris (Mammalia: Primates: Indridae): contextual variation in the long-distance calls of a lemur. Italian Journal of Zoology, 2013. 80, 4.
- Barelli, C., Mundry, R., Heistermann, M., Hammerschmidt, K. Cues to androgen and quality in male gibbon songs. PLoS ONE, 2013. 8: e82748.
Figure 1. A female indri with offspring in the Maromizaha Forest, Madagascar. Maromizaha is a New Protected Area located in the Region Alaotra-Mangoro, east of Madagascar. It is managed by GERP (Primate Studies and Research Group). At least 13 species of lemurs have been observed in the area.
Figure 2. Spectrograms of an indri song showing a typical sequence of different units. In the enlarged area, the pitch contour in red shows a typical “descending phrase” of 4 units. The indris also emit phrases of 2, 3 and more rarely 5 or 6 units.
Figure 3. A 3d-plot of the dimensions (DF1, DF2, DF3) generated from a Discriminant model that successfully assigned descending phrases of four units (DP4) to the emitter. Colours denote individuals. The descending phrases of two (DP2) and three units (DP3) also showed a percentage of correct classification rate significantly above chance.
Construction Noise Impact on Wild Birds
Pasquale Bottalico, PhD. – firstname.lastname@example.org
Voice Biomechanics and Acoustics Laboratory
Department of Communicative Sciences and Disorders
College of Communication Arts & Sciences
Michigan State University
1026 Red Cedar Road
East Lansing, MI 48824
Popular version of paper 3aAB7, “Construction noise impact on wild birds”
Presented Tuesday morning, May 25, 2016, 10:20, Salon I
171st ASA Meeting, Salt Lake City
Almost all bird species use acoustic signals to communicate or recognize biological signals – to mate, to detect the sounds of predators and/or prey, to perform mate selection, to defend their territory, and to perform social activities. Noise generated from human activities (in particular by infrastructure and construction sites) has a strong impact on the physiology and behaviour of birds. In this work, a quantitative method for evaluating the impact of noise on wild birds is proposed. The method combines the results of previous studies that considered the effect of noise on birds and involved noise mapping evaluations. A forecast noise simulation was used to generate maps of (1) masking-annoyance areas and (2) potential density variation.
An example of application of the masking-annoyance areas method is shown in Figure 1. If a bird is in the Zone 1 (in purple), traffic noise and construction noise can potentially result in hearing loss and threshold shift. A temporary elevation of the bird’s hearing threshold and a masking of important communication signals can occur in the Zone 2 (in red). Zone 3 (in orange), 4 (in yellow) and 5 (in light green) are characterized by a high, medium and low level of signal masking, respectively. Once the level of noise generated by human activities falls below ambient noise levels in the critical frequencies for communication (2–8 kHz), masking of communication signals is no longer an issue. However, low-frequency noise, such as the rumble of a truck, may still potentially cause other behavioural and/or physiological effects (Zone 6, in green). No effects of any kind occur on the birds in Zone 7 (in dark green). The roles for Zone definition are based on the results of Dooling and Popper. 
Figure 1 Mapping of the interaction areas of noise effect on birds within the 7 zones for a project without (a) and with mitigations (b).
Figure 1 Mapping of the interaction areas of noise effect on birds within the 7 zones for a project without (a) and with mitigations (b).
Waterman et al.  and Reijnem et al. [3-4-5] proposed a trend of the potential variation in birds density in relationship with the noise levels present in the area. This trend shows no effect on density when the noise levels are lower than 45 dB(A), while there is a rapid decrease (with a quadratic shape) for higher levels. An example of the potential decrease in bird density for a project with and without mitigations is shown in Figure 2. The blue areas are the areas where the birds’ density is not influenced by the noise, while the red ones are the areas from where the birds are leaving because the noise levels are too high.
This methodology permits a localization of the areas with greater impacts on birds. The mitigation interventions should be focused on these areas in order to balance bird habitat conservation and human use of land.
Figure 2 Potential decrease in bird density for a project without (a) and with mitigations (b).
Figure 2 Potential decrease in bird density for a project without (a) and with mitigations (b).
- R. J. Dooling and A. N. Popper, The effects of highway noise on birds, Report prepared for The California Department of Transportation Division of Environmental Analysis, (2007).
- E. Waterman, I. Tulp, R. Reijnen, K. Krijgsveld and C. ter Braak, “Noise disturbance of meadow birds by railway noise”, Inter-Noise2004, (2004).
- R. Reijnen and R. Foppen, “The effects of car traffic on breeding bird populations in woodland. IV. Influence of population size on the reduction of density close to the highway”, J. Appl. Ecol. 32(3), 481-491, (1995).
- R. Reijnen, R. Foppen, C. ter Braak and J. Thissen, “The effects of car traffic on breeding bird populations in Woodland. III. Reduction of density in relation to the proximity of main roads”, J. Appl. Ecol. 32(1), 187-202, (1995).
- R. Reijnen, G. Veenbaas and R. Foppen, Predicting the Effects of Motorway Traffic on Breeding Bird Populations. Ministry of Transport and Public Works, Delft, Netherlands, (1995).
A view askew: Bottlenose dolphins improve echolocation precision by aiming their sonar beam to graze the target
Laura N. Kloepper– email@example.com
Saint Mary’s College
Notre Dame, IN 46556
John R. Buck– firstname.lastname@example.org
University of Massachusetts Dartmouth
285 Old Westport Road
Dartmouth, MA 02747
Paul E. Nachtigall–email@example.com
University of Hawaii at Manoa
PO Box 1346
Kaneohe, HI 96744
Popular version of paper 3aUW8, “Bottlenose dolphins direct sonar clicks off-axis of targets to maximize Fisher Information about target bearing”
Presented Wednesday morning, November 4, 2015, 10:25 AM in River Terrace 2
170th ASA Meeting, Jacksonville
Bottlenose dolphins are incredible echolocators. Using just sound, they can detect a ping-pong ball sized object from 100 m away, and discriminate between objects differing in thickness by less than 1 mm. Based on what we know about man-made sonar, however, the dolphins’ sonar abilities are an enigma–simply put, they shouldn’t be as good at echolocation as they actually are.
Typical manmade sonar devices achieve high levels of performance by using very narrow sonar beams. Creating narrow beams requires large and costly equipment. In contrast to these manmade sonars, bottlenose dolphins achieve the same levels of performance with a sonar beam that is many times wider–but how? Understanding their “sonar secret” can help lead to more sophisticated synthetic sonar devices.
Bottlenose dolphins’ echolocation signals contain a wide range of frequencies. The higher frequencies propagate away from the dolphin in a narrower beam than the low frequencies do. This means the emitted sonar beam of the dolphin is frequency-dependent. Objects directly in front of the animal echo back all of the frequencies. However, as we move out of the direct line in front of the animal, there is less and less high frequency, and when the target is way off to the side, only the lower frequencies reach the target to bounce back. As shown below in Fig. 1, an object 30 degrees off the sonar beam axis has lost most of the frequencies.
Figure 1. Beam pattern and normalized amplitude as a function of signal frequency and bearing angle. At 0 degrees, or on-axis, the beam contains an equal representation across all frequencies. As the bearing angle deviates from 0, however, the higher frequency components fall off rapidly.
Consider an analogy to light shining through a prism. White light entering the prism contains every frequency, but the light leaving the prism at different angles contains different colors. If we moved a mirror to different angles along the light beam, it would change the color reflected as it moved through different regions of the transmitted beam. If we were very good, we could locate the mirror precisely in angle based on the color reflected. If the color changes more rapidly with angle in one region of the beam, we would be most sensitive to small changes in position at that angle, since small changes in position would create large changes in color. In mathematical terms, this region of maximum change would have the largest gradient of frequency content with respect to angle. The dolphin sonar appears to be exploiting a similar principle, only the different colors are different frequencies or pitch in the sound.
Prior studies on bottlenose dolphins assumed the animal pointed its beam directly at the target, but this assumption resulted in the conclusion that the animals shouldn’t be as “good” at echolocation as they actually are. What if, instead, they use a different strategy? We hypothesized that the dolphin might be aiming their sonar so that the main axis of the beam passes next to the target, which results in the region of maximum gradient falling on the target. Our model predicts that placing the region of the beam most sensitive to change on the target will give the dolphin greatest precision in locating the object.
To test our hypothesis, we trained a bottlenose dolphin to detect the presence or absence of an aluminum cylinder while we recorded the echolocation signals with a 16-element hydrophone array (Fig.2).
Figure 2: Experimental setup. The dolphin detected the presence or absence of cylinders at different distances while we recorded sonar beam aim with a hydrophone array.
We then measured where the dolphin directed its sonar beam in relation to the target and found the dolphin pointed its sonar beam 7.05 ± 2.88 degrees (n=1930) away from the target (Fig.3).
Figure 3: Optimality in directing beam away from axis. The numbers on the emitted beam represent the attenuation in decibels relative to the sound emitted from the dolphin. The high frequency beam (red) is narrower than the blue and attenuates at angle more rapidly. The dolphin directs its sonar beam 7 degrees away from the target.
To then determine if certain regions of the sonar beam provide more theoretical “information” to the dolphin, which would improve its echolocation, we applied information theory to the dolphin sonar beam. Using the weighted frequencies present in the signal, we calculated the Fisher Information for the emitted beam of a bottlenose dolphin. From our calculations we determined 95% of the maximum Fisher Information to be between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees (Fig. 4).
Figure 4: The calculated Fisher Information as a function of bearing angle. The peak of the information is between 6.0 and 8.5 degrees off center, with a peak at 7.2 degrees.
The result? The dolphin is using a strategy that is the mathematically optimal! By directing its sonar beam slightly askew of the target (such as a fish), the target is placed in the highest frequency gradient of the beam, allowing the dolphin to locate the target more precisely.
Energetically speaking, do all sounds that a dolphin makes cost the same?
Marla M. Holt – firstname.lastname@example.org
Dawn P. Noren – email@example.com
Conservation Biology Division
NOAA NMFS Northwest Fisheries Science Center
2725 Montlake Blvd East
Seattle WA, 98112
Robin C. Dunkin – firstname.lastname@example.org
Terrie M. Williams – email@example.com
Department of Ecology and Evolutionary Biology
University of California, Santa Cruz
100 Shaffer Road
Santa Cruz, CA 95060
Popular version of paper 2pABa9, “The metabolic costs of producing clicks and social sounds differ in bottlenose dolphins (Tursiops truncatus).”
Presented Tuesday afternoon, November 3, 2015, 3:15, City Terrace room
170th ASA Meeting Jacksonville
Dolphins are known to be quite vocal, producing a variety of sounds described as whistles, squawks, barks, quacks, pops, buzzes and clicks. These sounds can be tonal (think whistle) or broadband (think buzz), short or long, or loud or not. Some sounds, such as whistles, are used in social contexts for communication. Other sounds, such as clicks and buzzes, are used for echolocation, a form of active biosonar that is important for hunting fish . Regardless of what type of sound a dolphin makes in its diverse vocal repertoire, sounds are generated in an anatomically unique way compared to other mammals. Most mammals, including humans, make sound in their throats or technically, in the larynx. In contrast, dolphins make sound in their nasal cavity via two sets of structures called the “phonic lips” .
All sound production comes at an energetic cost to the signaler . That is, when an animal produces sound, metabolic rate increases a certain amount above baseline or resting (metabolic) rate. Additionally, many vociferous animals, including dolphins and other marine mammals, modify their acoustic signals in noise. That is, they call louder, longer or more often in an attempt to be heard above the background din. Ocean noise levels are rising, particularly in some areas from shipping traffic and other anthropogenic activities and this motivated a series of recent studies to understand the metabolic costs of sound production and vocal modification in dolphins.
We recently measured the energetic cost for both social sound and click production in dolphins and determined if these costs increased when the animals increased the loudness or other parameters of their sounds [4,5]. Two bottlenose dolphins were trained to rest and vocalize under a specialized dome which allowed us to measure their metabolic rates while making different kinds of sounds and while resting (Figure 1). The dolphins also wore an underwater microphone (a hydrophone embedded in a suction cup) on their foreheads to keep track of vocal performance during trials. The amount of metabolic energy that the dolphins used increased as the total acoustic energy of the vocal bout increased regardless of the type of sound the dolphin made. The results clearly demonstrate that higher vocal effort results in higher energetic cost to the signaler.
Figure 1 – A dolphin participating in a trial to measure metabolic rates during sound production. Trials were conducted in Dr. Terrie Williams’ Mammalian Physiology lab at the University of California Santa Cruz. All procedures were approved by the UC Santa Cruz Institutional Animal Care and Use Committee and conducted under US National Marine Fisheries Service permit No.13602.
These recent results allow us to compare metabolic costs of production of different sound types. However, the average total energy content of the sounds produced per trial was different depending on the dolphin subject and whether the dolphins were producing social sounds or clicks. Since metabolic cost is dependent on vocal effort, metabolic cost comparisons across sound types need to be made for equal energy sound production.
The relationship between energetic cost and vocal effort for social sounds allowed us to predict metabolic costs of producing these sounds at the same sound energy as in click trials. The results, shown in Figure 2, demonstrate that bottlenose dolphins produce clicks at a very small fraction of the metabolic cost of producing whistles of equal energy. These findings are consistent with empirical observations demonstrating that considerably higher air pressure within the dolphin nasal passage is required to generate whistles compared to clicks . This pressurized air is what powers sound production in dolphins and toothed whales  and mechanistically explains the observed difference in metabolic cost between the different sound types.
Figure 2 – Metabolic costs of producing social sounds and clicks of equal energy content within a dolphin subject.
Differences in metabolic costs of whistling versus clicking have implications for understanding the biological consequences of behavioral responses to ocean noise. Across different sound types, metabolic costs depend on vocal effort. Yet, overall costs of producing clicks are substantially lower than costs of producing whistles. The results reported in this paper demonstrate that the biological consequences of vocal responses to noise can be quite different depending on the behavioral context of the animals affected, as well as the extent of the response.
- Au, W. W. L. The Sonar of Dolphins, New York: Springer-Verlag.
- Cranford, T. W., et al., Observation and analysis of sonar signal generation in the bottlenose dolphin (Tursiops truncatus): evidence for two sonar sources. Journal of Experimental Marine Biology and Ecology, 2011. 407: p. 81-96.
- Ophir, A. G., Schrader, S. B. and Gillooly, J. F., Energetic cost of calling: general constraints and species-specific differences. Journal of Evolutionary Biology, 2010. 23: p. 1564-1569.
- Noren, D. P., Holt, M. M., Dunkin, R. C. and Williams, T. M. The metabolic cost of communicative sound production in bottlenose dolphins (Tursiops truncatus). Journal of Experimental Biology, 2013. 216: 1624-1629.
- Holt, M. M., Noren, D. P., Dunkin, R. C. and Williams, T. M. Vocal performance affects metabolic rate in dolphins: implication for animals communicating in noisy environments. Journal of Experimental Biology, 2015. 218: 1647-1654.