Kidney stone pushing and trapping using focused ultrasound beams of different structure
Oleg Sapozhnikov – firstname.lastname@example.org
Mike Bailey – email@example.com
Adam Maxwell – firstname.lastname@example.org
Moscow State Univerity
Center for Industrial and Medical Ultrasound
Applied Physics Laboratory
University of Washington
Seattle, WashingtonUNITED STATES
Popular version of paper 4pBA1, “Kidney stone pushing and trapping using focused ultrasound beams of different structure.”
Presented Thursday afternoon, December 1, 2016 at 1:00pmHAST.
172nd ASA Meeting, Honolulu
Urinary stones (such as kidney or bladder stones) are an important health care problem. One in 11 Americans now has urinary stone disease (USD), and the prevalence is increasing. According to a 2012 report from the National Institute of Diabetes and Kidney and Digestive Diseases (Urological Diseases in America), the direct medical cost of USD in the United States is $10 billion annually, making it the most expensive urologic condition.
Our lab is working to develop more effective and more efficient ways to treat stones. Existing treatments such as shock wave lithotripsy or ureteroscopy are minimally invasive, but can leave behind stone fragments that remain in the kidney and potentially regrow into larger stones over time. We have successfully developed and demonstrated the use of ultrasound to noninvasively move stones in the kidney of human subjects. This technology, called ultrasonic propulsion (UP), uses ultrasound to apply a directional force to the stone, propelling it in a direction away from the sonic source, or transducer. Some stones need to be moved towards the ureteropelvic junction (the exit from the kidney that allows stones to pass through the ureter to the bladder) to aid their passage. In other cases, this technology may be useful to relieve an obstruction caused by a stone that may just need a nudge or a rotation to pass, or at least to allow urine to flow and decompress the kidney.
While UP is able to help stones pass, it is limited in how the stones can be manipulated by an ultrasound transducer in contact with the skin from outside the body. Some applications require the stone to be moved sideways or towards the transducer rather than away from it.
To achieve more versatile manipulation of stones, we are developing a new strategy to effectively trap a stone in an ultrasound beam. Acoustic trapping has been explored by several other researchers, particularly for trapping and manipulating cells, bubbles, droplets, and particles much smaller than length of the sound wave. Different configurations have been used to trap particles in standing waves and focused fields. By trapping the stone in an ultrasound beam, we can then move the transducer or electronically steer the beam to move the stone with it.
In this work, we accomplished trapping through the use of vortex beams. Typical focused beams create a single region of high ultrasound pressure, producing radiation force away from the transducer. Vortex beams, on the other hand, are focused beams that create a tube-shaped intensity pattern with a spiraling wave front (Fig. 1). The ultrasound pressure in the middle is very low, while the pressure around the center is high. The result is that there is a component of the ultrasound radiation force pushing the stone towards the center and trapping it in the middle of the beam. In addition to trapping, such a beam can apply a torque to the stone and can rotate it.
To test this idea, we simulated the radiation force on spheres of different materials (including stones) to determine how each would respond in a vortex beam. An example is shown in Fig 2. A lateral-axial cross section of the beam is displayed, with a spherical stone off-center in the tube-shaped beam. The red arrow shows that the force on the sphere is away from the center because the stone is outside of the vortex. Once the center of the stone crosses the peak, the force is directed inward. Usually, there is also some force away from the transducer still, but the object can be trapped against a surface.
We also built transducers and electrically excited them to generate the vortex in experiments. At first, we used the vortex to trap, rotate, and drag an object on the water surface (Fig. 3). By changing the charge of the vortex beam (the rate of spiraling generated by the transducer), we controlled the diameter of the vortex beam, as well as the direction and speed at which the objects rotated. We also tested manipulation of objects placed deep in a water tank. Glass or plastic beads and kidney stones placed on a platform of tissue-mimicking material. By physically shifting the transducer, we were able to move these objects a specified distance and direction along the platform (Fig 4). These results are best seen in videos at apl.uw.edu/pushing stones.
We have since worked on developing vortex beams with a 256-element focused array transducer. Our complex array can electronically move the beam and drag the stone without physically moving the transducer. In a highly focused transducer, such as our array, sound can even be focused beyond the stone to generate an axial high pressure spot to help trap a stone axially or even pull the stone toward the transducer.
There are several ways in which this technology might be useful for kidney stones. In some cases, it might be employed in gathering small particles together and moving them collectively, holding a larger stone in place for fragmentation techniques such as lithotripsy, sweeping a stone when the anatomy inhibits direct radiation force away from the transducer, or, as addressed here dragging or pulling a stone. In the future, we expect to continue developing phased array technology to more controllably manipulate stones. We are also working to develop and validate new beam profiles, and electronic controls to remotely gather the stone and move it to a new location. We expect that this sort of tractor beam could also have applications in manufacturing, such as ever shrinking electronics, and even in space.
This work was supported by RBBR 14-02- 00426, NIH NIDDK DK43881 and DK104854, and NSBRI through NASA NCC 9-58.
- O.A. Sapozhnikov and M.R. Bailey. Radiation force of an arbitrary acoustic beam on an elastic sphere in a fluid. – J. Acoust. Soc. Am., 2013, v. 133, no. 2, pp. 661-676.
- A.V. Nikolaeva, S.A. Tsysar, and O.A. Sapozhnikov. Measuring the radiation force of megahertz ultrasound acting on a solid spherical scatterer. – Acoustical Physics, 2016, v. 62, no. 1, pp. 38-45.
- J.D. Harper, B.W. Cunitz, B. Dunmire, F.C. Lee, M.D. Sorensen, R.S. Hsi, J. Thiel, H. Wessells, J.E. Lingeman, and M.R. Bailey. First in human clinical trial of ultrasonic propulsion of kidney stones. – J. Urology, 2016, v. 195, no. 4 (Part 1), pp. 956–964.
Figure 1. The cross section at the focus for the ultrasound pressure of a vortex beam. The pressure magnitude (left) has a donut-shape distribution, whereas the phase (right) has a spiral-shape structure. A stone can be trapped at the center of the ring.
Figure 2. The simulated tubular field such as occurs in a vortex beam and its force on a stone. In this simulation the transducer is on the left and the ultrasound propagates to the right. The arrow indicates the force which depends on the position of the stone (see video ‘Vortex beam for transversal trapping.wmv’)
Figure 3. A small object trapped in the center of a vortex beam on the water surface. The ring-shaped impression due to radiation force on the surface can be seen. The phase difference between each sector element of the transducer affects the diameter of the beam and the spin rate. The 2 mm plastic object floating on the surface is made to rotate by the vortex beam (see video ‘’ Vortex beam rotates piece of plastic on water surface.wmv).
Figure 4. A focused vortex beam transducer in water (shown on the top) traps one of the styrofoam beads (shown in the bottom) and translates it in lateral direction (see video ‘Trapping and moving styrofoam beads underwater.wmv’)
Effects of meaningful or meaningless noise on psychological impression for annoyance and selective attention to stimuli during intellectual task
Takahiro Tamesue – email@example.com
1677-1 Yoshida, Yamaguchi
Yamaguchi Prefecture 753-8511
Popular version of poster 4aPPa24, “Effects of meaningful or meaningless noise on psychological impression for annoyance and selective attention to stimuli during intellectual task”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu
Open offices that make effective use of limited space and encourage dialogue, interaction, and collaboration among employees are becoming increasingly common. However, productive work-related conversation might actually decrease the performance of other employees within earshot — more so than other random, meaningless noises. When carrying out intellectual activities involving memory or arithmetic tasks, it is a common experience for noise to cause an increased psychological impression of “annoyance,” leading to a decline in performance. This is more apparent for meaningful noise, such as conversation, than it is for other random, meaningless noise. In this study, the impact of meaningless and meaningful noises on selective attention and cognitive performance in volunteers, as well as the degree of subjective annoyance of those noises, were investigated through physiological and psychological experiments.
The experiments were based on the so-called “odd-ball” paradigm — a test used to examine selective attention and information processing ability. In the odd-ball paradigm, subjects detect and count rare target events embedded in a series of repetitive events. To complete the odd-ball task it is necessary to regulate attention to a stimulus. In one trial, subjects had to count the number of times the infrequent target sounds occurred under meaningless or meaningful noises over a 10 minute period. The infrequent sound — appearing 20% of the time—was a 2 kHz tone burst; the frequent sound was a 1 kHz tone burst. In a visual odd-ball test, subjects observed pictures flashing on a PC monitor as meaningless or meaningful sounds were played to both ears through headphones. The most infrequent image was 10 x 10 centimeter-squared red image; the most frequent was a green square. At the end of the trial, the subjects also rated their level of annoyance at each sound on a seven-point scale.
During the experiments, the subjects brain waves were measured through electrodes placed on their scalp. In particular, we look at what is called, “event-related potentials,” very small voltages generated in the brain structures in response to specific events or stimuli that generate electroencephalograph waveforms. Example results, after appropriate averaging, of wave forms of event-related potentials under no external noise are shown in Figure 1. The so-called N100 component peaks negatively about 100 milliseconds after the stimulus and the P300 component positive peaks positively around 300 milliseconds after a stimulus, related to selective attention and working memory. Figure 2 and 3 show the results of event-related potentials for infrequent sound under the meaningless and meaningful noise. N100 and P300 components are smaller in amplitude and longer in latency because of the meaningful noise compared to the meaningless noise.
Figure 1. Averaged wave forms of evoked Event-related potentials for infrequent sound under no external noise.
Figure 2. Averaged wave forms of evoked Event-related potentials for infrequent sound under meaningless noise.
Figure 3. Averaged wave forms of auditory evoked Event-related potentials under meaningful noise.
We employed a statistical method called, “principal component analysis” to identify the latent components. Results of statistical analysis, where four principal components were extracted as shown in Figure 4. Considering the results, where component scores of meaningful noise was smaller than other noise conditions, meaningful noise reduces the component of event-related potentials. Thus, selective attention to cognitive tasks was influenced by the degree of meaningfulness of the noise.
Figure 4. Loadings of principal component analysis
Figure 5 shows the results for annoyance in the auditory odd-ball paradigms. These results demonstrated that the subjective experience of annoyance in response to noise increased due to the meaningfulness of the noise. The results revealed that whether the noise is meaningless or meaningful had a strong influence not only on the selective attention to auditory stimuli in cognitive tasks, but also the subjective experience of annoyance.
Figure 5. Subjective experience of annoyance (Auditory odd-ball paradigms)
That means that when designing sound environments in spaces used for cognitive tasks, such as the workplace or schools, it is appropriate to consider not only the sound level, but also meaningfulness of the noise that is likely to be present. Surrounding conversations often disturb the business operations conducted in such open offices. Because it is difficult to soundproof an open office, a way to mask meaningful speech with some other sound would be of great benefit for achieving a comfortable sound environment.
How do narration experts provide expressive storytelling in Japanese fairy tales?
Takashi Saito – firstname.lastname@example.org
Shonan Institute of Technology
Fujisawa, Kanagawa, JAPAN
Popular version of paper 2pSC, “Prosodic analysis of storytelling speech in Japanese fairy tale”
Presented Tuesday afternoon, November 29, 2016
172nd ASA Meeting, Honolulu
Recent advances in speech synthesis technologies bring us relatively high quality synthetic speech, as smartphones today often provide it with speech message output. The acoustic sound quality especially seems to sometimes be particularly close to that of human voices. Prosodic aspects, or the patterns of rhythm and intonation, however, still have large room for improvement. The overall speech messages generated by speech synthesis systems sound somewhat awkward and monotonous. In other words, those messages lack expressiveness of speech compared with human speech. One of the reasons for this is that most systems use a one-sentence speech synthesis scheme in which each sentence in the message is generated independently, connected just to construct the message. The lack of expressiveness might hinder widening the range of applications for speech synthesis. Storytelling is a typical application to expect speech synthesis to be capable of having a control mechanism beyond just one sentence to provide really vivid and expressive storytelling. This work attempts to investigate the actual storytelling strategies of human narration experts for the purpose of ultimately reflecting them on the expressiveness of speech synthesis.
A Japanese popular fairy tale titled, “The Inch-High Samurai,” in its English translation was the storytelling material in this study. It is a short story taking about six minutes to tell verbally. The story consists of four elements typically found in simple fairy tales: introduction, build-up, climax, and ending. These common features suit the story well for observing prosodic changes in the story’s flow. The story was told by six narration experts (four female and two male narrators) and were recorded. First, we were interested in what they were thinking while telling the story, so we interviewed them on their actual reading strategies after the recording. We found they usually did not adopt fixed reading techniques for each sentence, but tried to go into the world of the story, and make a clear image of characters appearing in the story, as would an actor. They also reported paying attention to the following aspects of the scenes associated with the story elements: In the introduction, featuring the birth of the little Samurai character, they started to speak slowly and gently in effort to grasp the hearts of listeners. In the story’s climax, depicting the extermination of the devil character, they tried to express a tense feeling through a quick rhythm and tempo. Finally, in the ending, they gradually changed their reading styles to make the audience understand that the happy ending is coming soon.
For all six speakers a baseline speech segmentation was conducted for words, and accentual phrases in a semi-automatic way. We then used a multi-layered prosodic tagging method, performed manually, to provide information on various changes of “story states” relevant to impersonation, emotional involvement and scene flow control. Figure 1 shows an example of the labeled speech data. Wavesurfer  software served as our speech visualization and labelling tool. The example utterance contains a part of the storyteller’s speech (containing the phrase “oniwa bikkuridesu” meaning, “the devil was surprised,” and devil’s part, “ta ta tasukekuree,” meaning, “please help me!”) and is shown in the top label pane for characters (chrlab). The second top label pane (evelab) shows event labels such as scene changes and emotional involvement (desire, joy, fear, etc…). In this example, a “fear” event is attached to the devil’s utterance part. The dynamic pitch movement can be observed in the pitch contour pane located at the bottom of the figure.
How are the events of scene change or emotional involvement provided by human narrators manifested in speech data? Prosodic parameters of speed, measured in speech rate or mora/sec; pitch, measured in Hz; power, measured in dB; and preceding pause length, measured in seconds, are investigated for all the breath groups in the speech data. Breath group refers to a speech segment which is uttered consecutively without pausing. Figure 2, 3 and 4 show these parameters at a scene-change event (Figure 2), desire event (Figure 3), and fear event (Figure 4). The axis on the left of the figures shows the ratio of the parameter to its average value. Each event has its own distinct tendency on prosodic parameters, also seen in the figures, which seems to be fairly common to all speakers. For instance, the differences between the scene-change event and the desire event are the amount of preceding pause and the degree of the contributions from the other three parameters. The fear event shows a quite different tendency from other events, but it is common to all speakers though the degree of the parameter movement differs between speakers. Figure 5 shows how to expresses character differences, when the reader impersonates the story’s characters, with the three parameters. In short, speed and pitch are changed dynamically for impersonation, and this is a common tendency of all speakers.
Based on findings obtained from these human narrations, we are designing a framework of mapping story events through scene changes and emotional involvement to prosodic parameters. Simultaneously, it is necessary to build additional databases to ensure and reinforce story event description and mapping framework.
 Wavesurfer: http://www.speech.kth.se/wavesurfer/
INSERT FIGURES HERE
Aero-Acoustic Noise and Control Lab – Seoryong Park – email@example.com
School of Mechanical and Aerospace Eng., Seoul National University
301-1214, 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea
Popular version of paper 4aEA1, “Integrated simulation model for prediction of acoustic environment of launch vehicle”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu
Literally speaking, a “sound” refers to a pressure fluctuation of the air. This means, for example, the sound of a bus passing means our ear senses the pressure fluctuation or pressure variation the bus created. During our daily lives, there are rarely significant pressure fluctuations in the air above common noises, but in special cases it happens. Windows are commonly featured in movies breaking from someone screaming loudly or in high pitches in the movie. This is usually exaggerated, but not out of the realm of what is physically possible.
The pressure fluctuations in the air caused by sound can cause engineering problems for loud structures such as rockets, especially given that the pressure nature of the sounds waves that means louder sounds result from larger pressure fluctuations and can cause more damage. Rocket launches are particularly loud and the resulting pressure change in the air can affect the surface of the launched vehicle as the form of the force shown as Figure 1.
Figure 1. The Magnitude of Acoustic Loads on the Launch Vehicle
As the vehicle is launched (Figure. 2),it reaches volumes over 180dB, which corresponds to about 20,000 Pascals in pressure change. This pressure change is about 20% of atmospheric pressure, which is considered very large. Because of the pressure change during launching, communication equipment and antenna panel can incur damage, causing the malfunctioning of the fairing, the protective cone covering the satellite. In the engineering field, the load created by the launching noise is called acoustic load, and many studies are in progress related to acoustic load.
Studies focused on the relationship between a launching vehicle and its acoustic load is categorized, to rocket engineers, under “prediction and control.” Prediction is divided into two aspects: internal acoustic load; and external acoustic load. Internal acoustic load refers to sound delivered from outside to inside, while external acoustic load is the noise directly from the jet fire. There are two ways to predict the external acoustic load, namely an empirical method and numerical method. The empirical method was developed by NASA in 1972 and uses the collected information from various studies. The numerical method employs mathematical formulas related to noise and electric wave calculated using computer modeling. As computers become more powerful, this method continues to gain favor. However, because numerical methods require so much calculation time, they often require the use of dedicated computing centers. Our team instead focused on using the more efficient and faster empirical method.
Figure 3 shows the results of our calculations, depicting the expected sound spectrum. We can consider various physics principles involved during a lift-off, such as sound reflection, diffraction and impingement that could affect the original empirical method results.
Meanwhile, our team used a statistical energy analysis method to predict the internal acoustic load caused by the predicted external acoustic load. This method is used often to predict internal noise environments. It is used to predict the internal noise of a launching vehicle as well as aircraft and automobile noise. Our research team used a program called, VA One SEA, for predicting these noise effects, shown as figure. 4.
Figure 4. Modeling of the Payloads and Forcing of the External Acoustic Loads
After predicting internal acoustic load, we decreased the acoustic load to conduct an internal noise control study. A common way to do this is by sticking noise-reducing material to the structure. However, the extra weight from the noise-reducing material can cause decreased performance. To overcome this side effect, we also conducted a study about active noise control, which is in progress. Active noise control refers to reducing the noise by making antiphase waves of the sound for cancelling. Figure 5 shows the experimental results of applied SISO Noise Control, showing the reduction of noise is significant, especially for low frequencies.
Figure 5. Experimental Results of SISO Active Noise Control
Our research team applied the acoustic load prediction method and control method to the Korean launching vehicle, KSR-111. Through this application, we developed an improved empirical prediction method that is more accurate than previous methods, and we found usefulness of the noise control as we established the best algorithm for our experimental facilities and the active noise control area.
Acoustic Cloaking Using the Principles of Active Noise Cancellation
Jordan Cheer – firstname.lastname@example.org
Institute of Sound and Vibration Research
University of Southampton
Popular version of paper 4pEA7, “Cancellation, reproduction and cloaking using sound field control”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu
Loudspeakers are synonymous with audio reproduction and are widely used to play sounds people want to hear. Loudspeakers have also been used for the opposite purpose, to attenuate noise that people may not want to hear. Active noise cancellation technology is an example of this, which combines loudspeakers, microphones and digital signal processing to adaptively control unwanted noise sources .
More recently, the scientific community has focused attention on controlling and manipulating sound fields to acoustically cloak objects, with the aim of rendering objects acoustically invisible. A new class of engineered materials called metamaterials have already demonstrated this ability . However, acoustic cloaking has also been demonstrated using methods based on both sound field reproduction and active noise cancellation . Despite its demonstration there has been limited research exploring the physical links between acoustic cloaking, active noise cancellation and sound field reproduction. Therefore, we began exploring these links with the aim of developing active acoustic cloaking systems that build on the advanced knowledge of implementing both audio reproduction and active noise cancellation systems.
Acoustic cloaking attempts to control the sound scattered from a solid object. Using a numerical computer simulation, we therefore investigated the physical limits on active acoustic cloaking in the presence of a rigid scattering sphere. The scattering sphere, shown in Figure 1, was surrounded by an array of sources (loudspeakers) used to control the sound field, shown by the black dots surrounding the sphere in the figure. In the first instance we investigated the effect of the scattering sphere on a simple sound field.
Looking at a horizontal slice through the simulated sound field without a scattering object, shown in the second figure, modifications by the presence of the scattering sphere are obvious in comparison to the same slice when the object is present, seen in third figure. Scattering from the sphere distorts the sound field, rendering it acoustically visible.
Figure 1 – The geometry of the rigid scattering sphere and the array of sources, or loudspeakers used to control the sound field (black dots).
Figure 2 – The sound field due to an acoustic plane wave in the free field (without scattering).
Figure 3 – The sound field produced when an acoustic plane wave is incident on the rigid scattering sphere.
To understand the physical limitations on controlling this sound field, and thus implementing an active acoustic cloak, we investigated the ability of the array of loudspeakers surrounding the scattering sphere to achieve acoustic cloaking . In comparison to active noise cancellation, rather than attempting to cancel the total sound field, we only attempted to control the scattered component of the sound field and thus render the sphere acoustically invisible.
With active acoustic cloaking, the sound field appears undisturbed, where the scattered component has been significantly attenuated and results in a field, shown in the fourth figure, that is indistinguishable from the object-less simulation of the Figure 2.
Figure 4 – The sound field produced when active acoustic cloaking is used to attempt to cancel the sound field scattered by a rigid scattering sphere and thus render the scattering sphere acoustically ‘invisible’.
Our results indicate active acoustic attenuation can be achieved using an array of loudspeakers surrounding a sphere that would otherwise scatter sound detectably. In this and related work, further investigations showed that the performance of active acoustic cloaking is most effective when the loudspeakers are in close proximity to the object being cloaked. This may lead to design concepts involving acoustic sources embedded in objects for acoustic cloaking or control of the scattered sound field.
Future work will attempt to demonstrate the performance of active acoustic cloaking experimentally and overcome significant challenges of not only controlling the scattered sound field, but detecting it using an array of microphones.
 P. Nelson and S. J. Elliott, Active Control of Sound, 436 (Academic Press, London) (1992).
 L. Zigoneanu, B.I. Popa, and S.A. Cummer, “Three-dimensional broadband omnidirectional acoustic ground cloak”. Nat. Mater, 13(4), 352-355, (2014).
 E. Friot and C. Bordier, “Real-time active suppression of scattered acoustic radiation”, J. Sound Vib., 278, 563–580 (2004).
 J. Cheer, “Active control of scattered acoustic fields: Cancellation, reproduction and cloaking”, J. Acoust. Soc. Am., 140 (3), 1502-1512 (2016).
Shape changing artificial ear inspired by bats enriches speech signals
Anupam K Gupta1,2 , Jin-Ping Han ,2, Philip Caspers1, Xiaodong Cui2, Rolf Müller1
1 Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
2 IBM T. J. Watson Research Center, Yorktown, NY, USA
Contact: Jin-Ping Han – email@example.com
Popular version of paper 1aSC31, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals.”
Presented Monday morning, Novemeber 28, 2016
172nd ASA Meeting, Honolulu
Have you ever had difficulty understanding what someone was saying to you while walking down a busy big city street, or in a crowded restaurant? Even if that person was right next to you? Words can become difficult to make out when they get jumbled with the ambient noise – cars honking, other voices – making it hard for our ears to pick up what we want to hear. But this is not so for bats. Their ears can move and change shape to precisely pick out specific sounds in their environment.
This biosonar capability inspired our artificial ear research and improving the accuracy of automatic speech recognition (ASR) systems and speaker localization. We asked if could we enrich a speech signal with direction-dependent, dynamic features by using bat-inspired reception dynamics?
Horseshoe bats, for example, are found throughout Africa, Europe and Asia, and so-named for the shape of their noses, can change the shape of their outer ears to help extract additional information about the environment from incoming ultrasonic echoes. Their sophisticated biosonar systems emit ultrasonic pulses and listen to the incoming echoes that reflect back after hitting surrounding objects by changing their ear shape (something other mammals cannot do). This allows them to learn about the environment, helping them navigate and hunt in their home of dense forests.
While probing the environment, horseshoe bats change their ear shape to modulate the incoming echoes, increasing the information content embedded in the echoes. We believe that this shape change is one of the reasons bats’ sonar exhibit such high performance compared to technical sonar systems of similar size.
To test this, we first built a robotic bat head that mimics the ear shape changes we observed in horseshoe bats.
Figure 1: Horseshoe bat inspired robotic set-up used to record speech signal
We then recorded speech signals to explore if using shape change, inspired by the bats, could embed direction-dependent dynamic features into speech signals. The potential applications of this could range from improving hearing aid accuracy to helping a machine more-accurately hear – and learn from – sounds in real-world environments.
We compiled a digital dataset of 11 US English speakers from open source speech collections provided by Carnegie Mellon University. The human acoustic utterances were shifted to the ultrasonic domain so our robot could understand and play back the sounds into microphones, while the biomimetic bat head actively moved its ears. The signals at the base of the ears were then translated back to the speech domain to extract the original signal.
This pilot study, performed at IBM Research in collaboration with Virginia Tech, showed that the ear shape change was, in fact, able to significantly modulate the signal and concluded that these changes, like in horseshoe bats, embed dynamic patterns into speech signals.
The dynamically enriched data we explored improved the accuracy of speech recognition. Compared to a traditional system for hearing and recognizing speech in noisy environments, adding structural movement to a complex outer shape surrounding a microphone, mimicking an ear, significantly improved its performance and access to directional information. In the future, this might improve performance in devices operating in difficult hearing scenarios like a busy street in a metropolitan center.
Figure 2: Example of speech signal recorded without and with the dynamic ear. Top row: speech signal without the dynamic ear, Bottom row: speech signal with the dynamic ear