2aSP2 – Self-Driving Cars: Have You Considered Using Sound?

Keegan Yi Hang Sim – yhsim@connect.ust.hk
Yijia Chen
Yuxuan Wan
Kevin Chau

Department of Electronic and Computer Engineering
Hong Kong University of Science and Technology
Clear Water Bay
Hong Kong

Popular version of paper 2aSP2 Advanced automobile crash detection by acoustic methods
Presented Tuesday morning, December 03, 2019
178th Meeting, San Diego, CA
Read the article in Proceedings of Meetings on Acoustics

Introduction
Self-driving cars are currently a major interest for engineers around the globe. They incorporate more advanced versions of steering and acceleration control found in many of today’s cars. Cameras, radars, and lidars (light detection and ranging) are frequently used to detect obstacles and automatically brake to avoid collision. Air bags, which have been in use as early as 1951, soften the impact during an actual collision.

Vision Zero, an ongoing multinational effort, hopes that all car crashes will eventually be eliminated, and self-driving autonomous vehicles are likely to play a key role in achieving this. However, current technology is unlikely to be enough, as it does not works poorly in low light conditions. We believe that using sound, although it provides less which carries a unique information, is also important as it can be used in all scenarios and also likely performs much better.

Sound waves travel as fast as seventeen times faster in a car than at 1/3 of a kilometer per second in the air, which leads to much faster detection by using sound instead of acceleration, and clearly is not affected by light, air quality, and other factors. Previous research was able to use sound to detect collisions and sirens, but by the time a collision occurs, it is far too late. So instead we want to identify sounds that frequently occur before car crashes, such as tire skidding, honking, and sometimes screaming to figure out the direction they are coming from. Thus, we have designed a method to predict a car crash by detecting and isolating the sounds of tire skidding that might signal a possible crash.

Algorithm
The algorithm utilizes the discrete wavelet transform (DWT), which decomposes a sound wave into high- and low-frequency components in time all sorts of tiny waves each lasting for a short period in time. This can be done repeatedly, yielding a series of components of various frequencies. Using wavelets is significantly faster and gives much more accurate and precise results representation of transient events associated with car crashes than elementary techniques such as the Fourier Transform, which transforms a sound into its frequency steady oscillation components. Previous methods to detect car crashes examined the highest frequency components, but tire skidding only contains lower frequency components, whereas a collision contains almost all frequencies.

One can hear in the original audio of a car crash the three distinct sections: honking, tire skidding, and the collision.

cars
The top diagram shows the audio displayed as a waveform, plotted against time. The bottom shows a spectrogram of the audio, with frequency on the y-vertical axis and time on the horizontal x-axis, and the brightness of the color representing the magnitude of a particular frequency component. This was created using a variation of the Fourier Transform. One can observe the differences in appearance between honking, tire skidding, and collision, which suggests that mathematical methods should be able to detect and isolate these. We can also see that the collision occupies all frequencies while tire skidding occupies lower frequencies with two distinct sharp bands at around 2000Hz.

“OutCollision.wav , the isolated audio containing just that isolates the car crash”

Using our algorithm, we were able to create audio files containing just that isolate the honking, tire skidding, as well as the collision. One can hear that they doThey may not sound like normal honking, tire skidding or collisions, which is a byproduct of our algorithm. Fortunately, but this does not affect the ability to detect the tire skidding various events by a computer.

Conclusion
The algorithm performs well for detecting the honking and tire skidding, and is fast enough to be done in real time, before acceleration information can be processed which would be great for the raising the alert of a possible crash, and for activating the hazard lights and seatbelt pre-tensioners. The use of sound in cars is a big step forward for the analysis prevention of car crashes, as well as improving autonomous and driverless vehicles and achieving Vision Zero, by providing a car with more timely and valuable information about its surroundings.

2aSPa8 and 4aSP6 – Safe and Sound – Using acoustics to improve the safety of autonomous vehicles

Eoin A King – eoking@hartford.edu
Akin Tatoglu
Digno Iglesias
Anthony Matriss
Ethan Wagner

Department of Mechanical Engineering
University of Hartford
200 Bloomfield Avenue
West Hartford, CT 06117

Popular version of papers 2aSPa8 and 4aSP6
Presented Tuesday and Thursday morning, May 14 & 16, 2019
177th ASA Meeting, Louisville, KY

Introduction
In cities across the world everyday, people use and process acoustic alerts to safely interact in and amongst traffic; drivers listen for emergency sirens from police cars or fire engines, or the sounding of a horn to warn of an impending collision, while pedestrians listen for cars when crossing a road – a city is full of sounds with meaning, and these sounds make the city a safer place.

Future cities will see the large-scale deployment of (semi-) autonomous vehicles (AVs). AV technology is quickly becoming a reality, however, the manner in which AVs and other vehicles will coexist and communicate with one another is still unclear, especially during the prolonged period of mixed vehicles sharing the road. In particular, the manner in which Autonomous Vehicles can use acoustic cues to supplement their decision-making process is an area that needs development.

The research presented here aims to identify the meaning behind specific sounds in a city related to safety. We are developing methodologies to recognize and locate acoustic alerts in cities and use this information to inform the decision-making process of all road users, with particular emphasis on Autonomous Vehicles. Initially we aim to define a new set of audio-visual detection and localization tools to identify the location of a rapidly approaching emergency vehicle. In short we are trying to develop the ‘ears’ to complement the ‘eyes’ already present on autonomous vehicles.

Test Set-Up
For our initial tests we developed a low cost array consisting of two linear arrays of 4 MEMS microphones. The array was used in conjunction with a mobile robot equipped with visual sensors as shown in Fig. 1. Our array acquired acoustic signals that were analyzed to i) identify the presence of an emergency siren, and then ii) determine the location of the sound source (which was occasionally behind an obstacle). Initially our tests were conducted in the controlled setting of an anechoic chamber.

autonomous vehicles

Picture 1: Test Robot with Acoustic Array

Step 1: Using convolutional neural networks for the detection of an emergency siren
Using advanced machine learning techniques, it has become possible to ‘teach’ a machine (or a vehicle) to recognize certain sounds. We used a deep layer Convolutional Neural Network (CNN) and trained it to recognize emergency sirens in real time, with 99.5% accuracy in test audio signals.

Step 2: Identifying the location of the source of the emergency siren
Once an emergency sound has been detected, it must be rapidly localized. This is a complex task in a city environment, due to moving sources, reflections from buildings, other noise sources, etc. However, by combining acoustic results with information acquired from the visual sensors already present on an autonomous vehicle, it will be possible to identify the location of a sound source. In our research, we modified an existing direction-of-arrival algorithm to report a number of sound source directions, arising from multiple reflections in the environment (i.e. every reflection is recognized as an individual source). These results can be combined with the 3D map of the area acquired from the robot’s visual sensors. A reverse ray tracing approach can then be used to triangulate the likely position of the source.

Picture 2: Example test results. Note in this test our array indicates a source at approximately 30o and another at approximately -60o.

Picture 3: Ray Trace Method. Note, by tracing the path of the estimated angles, both reflected and direct, the approximate source location can be triangulated.

Video explaining theory.

4aSP4 – Streaming Video through Biological Tissues using Ultrasonic Communication

Gizem Tabak – tabak2@illinois.edu
Michael Oelze – oelze@illinois.edu
Andrew Singer – acsinger@illinois.edu
University of Illinois at Urbana-Champaign
306 N Wright St
Urbana, IL 61801

Popular version of paper 4aSP4
Presented Thursday morning, May 16, 2019
177th ASA Meeting, Louisville, KY

Researchers at the University of Illinois at Urbana-Champaign have developed a fast, wireless communication alternative that also has biomedical implications. Instead of using radio frequency (RF) to transmit signals, the team is using ultrasonic waves to send signals at high enough data rates to transmit video through animal or human tissue.

The team of electrical and computer engineering professors Andrew Singer and Michael Oelze and graduate researcher Gizem Tabak have achieved a transmission rate of 4 megabits per second through animal tissue with 2-mm transmitting devices. This rate is high enough to send high definition video (3 Mbps) and 15 times faster than that RF waves can currently deliver.

ultrasonic communication

Figure 1 – Experimental setup for streaming at 4Mbps through 2” beef liver

The team is using this approach for communicating with implanted medical devices, like those used to scan tissue in a patients’ gastrointestinal (GI) tract.

Currently one of two methods are used to image the GI tract. The first is video endoscopy, which involves inserting a long probe with a camera and light down the throat to take real-time video and send it to an attached computer. This method has limitations in that it cannot reach the midsection of the GI tract and is highly invasive.

The second method involves a patient swallowing a pill that contains a mini camera that can take images throughout the tract. After a day or so, the pill is retrieved, and the physician can extract the images. This method, however, is entirely offline, meaning there is no real-time interaction with the camera inside the patient.

A third option uses the camera pill approach but sends the images through RF waves, which are absorbed by the surrounding tissue. Due to safety regulations governing electromagnetic radiation, the transmitted signal power is limited, resulting in data rates of only 267 kilobits per second.

The Illinois team is proposing to use ultrasound, a method that has already proven safe for medical imaging, as a communication method. Having achieved data rates of 4 Mbps with this system through animal tissue, the team is translating the approach to operate in real-time for use in the human body.

Pairing this communication technology with the camera pill approach, the device not only could send real-time video, but also could be remotely controlled. For example, it might travel to specific areas and rotate to arbitrary orientations. It may even be possible to take tissue samples for biopsy, essentially replacing endoscopic procedures or surgeries through such mini-remote controlled robotic devices.

1aSP1 – From Paper Cranes to New Tech Gains: Frequency Tuning through Origami Folding

Kazuko Fuchi – kfuchi1@udayton.edu
University of Dayton Research Institute
300 College Park, Dayton, OH 45469

Andrew Gillman – andrew.gillman.1.ctr@us.af.mil
Alexander Pankonien – alexander.pankonien.1@us.af.mil
Philip Buskohl – philip.buskohl.1@us.af.mil
Air Force Research Laboratory
Wright-Patterson Air Force Base, OH 45433

Deanna Sessions – deanna.sessions@psu.edu
Gregory Huff – ghuff@psu.edu
Department of Electrical Engineering and Computer Science
Penn State University
207 Electrical Engineering West, University Park, PA 16802

Popular version of lecture: 1aSP1 Topology optimization of origami-inspired reconfigurable frequency selective surfaces
Presented Monday morning, 9:00 AM – 11:15 AM, May 13, 2019
177th ASA Meeting, Louisville, Kentucky

The use of mathematics and computer algorithms by origami artists has led to a renaissance of the art of origami in recent decades. Combining scientific tools with their imagination and artistic skills, these artists discover intricate origami designs that inspire expansive possibilities of the art form.

The intrigue of realizing incredibly complex creatures and exquisite patterns from a piece of paper has captured the attention of the scientific and engineering communities. Our research team and others in the engineering community wanted to make use of the language of origami, which gives us a natural way to navigate through complex geometric transformations through 2D (flat), 3D (folded) and 4D (folding motion) spaces. This beautiful language has enabled numerous innovative technologies including foldable and deployable satellites, self-folding medical devices and shape-changing robots.

Origami, as it turns out, is also useful in controlling how sound and radio waves travel. An electromagnetic device called an origami frequency selective surface for radio waves can be created by laser-scoring and folding a plastic sheet into a repeating pattern called a periodic tessellation and printing electrically conductive, copper decorations aligned with the pattern on the sheet (Figure 1). We have shown that this origami folded device can be used as a filter to block unwanted signals at a specific operating frequency. We can fold and unfold this device to tune the operating frequency, or we can design a device that can be folded, unfolded, bent and twisted into a complex surface shape without changing the operating frequency, all depending on the design of the folding and printing patterns. These findings encourage more research in origami-based innovative designs to accomplish demanding goals for radar, communication and sensor technologies.

origamiFigure 1: Fabricated prototype of origami folded frequency selective surface made of a folded plastic sheet and copper prints, ready to be tested in an anechoic chamber – a room padded with radio-wave-absorbing foam pyramids.

Origami can be used to choreograph complex geometric rearrangements of the active components. In the case of our frequency selective surface, the folded plastic sheet acts as the medium that hosts the electrically active copper prints. As the sheet is folded, the copper prints fold and move relative to each other in a controlled manner. We used our theoretical knowledge along with insight gained from computer simulations to understand how the rearrangements impact the physics of the device’s working mechanism and to decide what designs to fabricate and test in the real world. In this, we attempt to imitate the origami artist’s magical creation of awe-inspiring art in the engineering domain.

5pSP6 – Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment

Joseph Esce – esce@hartford.edu
Eoin A King – eoking@hartford.edu
Acoustics Program and Lab
Department of Mechanical Engineering
University of Hartford
200 Bloomfield Avenue
West Hartford
CT 06119
U.S.A

Popular version of paper 5pSP6: “Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment”, presented Friday afternoon, November 9, 2018, 2:30 – 2:45pm, RATTENBURY A/B, ASA 176th Meeting/2018 Acoustics Week in Canada, Victoria, Canada.

Virtual RealityIntroduction
While visual graphics in Virtual Reality (VR) systems are very well developed, the manner in which acoustic environments and sounds may be recreated in a VR system is not. Currently, the standard procedure to represent sound in a virtual environment is to use a generic head related transfer function (HRTF), i.e. a user selects a generic HRTF from a library, with limited personal information. It is essentially a ‘best-guess’ representation of an individual’s perception of a sound source. This limits the accuracy of the representation of the acoustic environment, as every person has a HRTF that is unique to themselves.

What is a HRTF?
If you close your eyes and someone jangles keys behind your head, you will be able to identify the general location of the keys just from the sound you hear. A HRTF is a mathematical function that captures these transformations, and can be used to recreate the sound of those keys in a pair of headphones – so that it appears that the sound recording of the keys has a direction associated with it. However, everyone has vastly different ear and head shapes, therefore HRTFs are unique to each person. The objective of our work was to determine how the accuracy of sound localization in a VR world varies for different users, and how we can improve it.

Test procedure
In our tests, volunteers entered a VR world, which was essentially an empty room, and an invisible sound source made a short bursts of noise at various positions in the room. Volunteers were asked to point to the location of the sound source, and results were captured using the VR’s motion tracking system. Results were captured to the nearest millimeter. We tested three cases: 1) where volunteers were not allowed to move their head to assist in the localization, 2) where some slight head movements were allowed to assist in sound localization, and 3) where volunteers could turn around freely and ‘search’ (with their ears) for the sound source. The head movement was tracked by using the VR system to track the volunteer’s eye movement, and if the volunteer moved, the sound source was switched off.

Results
We observed that the accuracy with which volunteers were able to localize the sound source varied significantly from person to person. There was significant error when volunteers’ head movements were restricted, but the accuracy significantly improved when people were able to move around and listen to the sound source. This suggests that the initial impression of a sounds location in a VR world is refined when the user can move their head to refine their search.

Future Work
We are currently analyzing our results in more detail to account for the different characteristics of each user (e.g. head size, size and shape of ear, etc). Further, we are aiming to develop the experimental methodology to use machine learning algorithms enabling each user to create a pseudo-personalized HRTF, which would improve the immersive experience for all VR users.

5aSP2 – Two-dimensional high-resolution acoustic localization of distributed coherent sources for structural health monitoring

Tyler J. Flynn (t.jayflynn@gmail.com),
David R. Dowling (drd@umich.edu)

University of Michigan
Mechanical Engineering Dept.
Ann Arbor, MI 48109

Popular version of paper 5aSP2 “Two-dimensional high-resolution acoustic localization of distributed coherent sources for structural health monitoring”
Presented Friday morning, 9 November 2018 9:15-9:30am Rattenbury A/B
176th ASA Meeting Victoria, BC

When in use, many structures – like driveshafts, windmill blades, ship hulls, etc. – tend to vibrate, casting pressure waves (aka sound) into the surrounding environment. When worn or damaged, these systems may vibrate differently, resulting in measurable changes to the broadcast sound. This presents an opportunity for the enterprising acoustician: could you monitor systems, and even locate structural defects, at a distance by exploiting acoustic changes? Such a technique would surely be useful for structures that are difficult to reach or that are in challenging environments, like ships in the ocean – though these benefits would come at the cost of the added complexity to measure sound precisely. This work shows that yes, it is possible to localize defects using only acoustic measurements, and such a technique is validated with two proof-of-concept experiments.

In cases where damage affects how a structure vibrates locally (e.g. near the defect), localizing the damage reduces to finding out where the source of the sound is changing. The most common method for figuring out where sound is coming from is known as beamforming. Put simply, beamforming involves listening for sounds at different points in space (using multiple microphones known as an array) then looking for relative time delays between microphones to back out the direction(s) of the incident sound. This presents two distinct challenges for locating defects: 1) the acoustic changes from a defect are pretty small compared to all the sound being generated, so they can easily get ‘washed out’. This can be addressed by using previously recorded measurements of the undamaged structure, then subtracting these recordings in a special way such that the differences between the damaged and undamaged structures are localized. Even then, more advanced high-resolution beamforming techniques are needed to precisely pinpoint changes. This leads to the second challenge, 2) Sound emitted from vibrating structures is typically coherent (meaning that sounds coming from different directions are strongly related) and this causes problems for high-resolution beamforming. However, a trick can be used wherein the full array of microphones is divided into smaller subarrays that can then be averaged in a special way to side-step the coherence problem.

acoustic localization

Figure 1: Experimental setups. The square microphone array sitting above a single speaker source (top left). The microphone array sitting above the clamped aluminum plate that is vibrated from below (right). A close-up of the square microphone array (bottom left).

Two validation experiments were conducted. In the first, an 8×8 array of 64 microphones was used to record 5kHz pulses from small loudspeakers at various locations on the floor (Figure 1). With three speaker sources in an arbitrary configuration, a recording was made. The volume of one source was then reduced 20% and another measurement was made. Using the described method (with the 8×8 array subdivided and averaged over 25 4×4 subarrays) the 20% change was precisely located with great agreement to computer simulations of the experiment (Figure 2). To test for actual damage, in the second experiment, a 3.8-cm cut was added to a 30-cm-square aluminum plate. The plate, vibrated from below to induce sound, was recorded from above, with and without the cut. Once again using the special method described here, the change, i.e. the cut was successfully found (Figure 3) – a promising result for practical applications of the technique.

Figure 2: Results of the first experiment. The top row of images uses the proposed technique, while the bottom uses a conventional technique. A ‘subtraction’ between the two very similar acoustic measurements (far, center left) allows for precise localization of the 20% change (center right) and great agreement with simulated results (far right).

Figure 3: Results of the second experiment. The two left images show vibrational measurement of the plate (vibrated around 830 Hz) with and without the added cut, showing that the cut noticeably affects the vibration. The right image shows high-resolution acoustic localization of the cut using the described technique (at 3600 Hz).