5pSP6 – Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment

Joseph Esce – esce@hartford.edu
Eoin A King – eoking@hartford.edu
Acoustics Program and Lab
Department of Mechanical Engineering
University of Hartford
200 Bloomfield Avenue
West Hartford
CT 06119
U.S.A

Popular version of paper 5pSP6: “Assessing the Accuracy of Head Related Transfer Functions in a Virtual Reality Environment”, presented Friday afternoon, November 9, 2018, 2:30 – 2:45pm, RATTENBURY A/B, ASA 176th Meeting/2018 Acoustics Week in Canada, Victoria, Canada.

Virtual RealityIntroduction
While visual graphics in Virtual Reality (VR) systems are very well developed, the manner in which acoustic environments and sounds may be recreated in a VR system is not. Currently, the standard procedure to represent sound in a virtual environment is to use a generic head related transfer function (HRTF), i.e. a user selects a generic HRTF from a library, with limited personal information. It is essentially a ‘best-guess’ representation of an individual’s perception of a sound source. This limits the accuracy of the representation of the acoustic environment, as every person has a HRTF that is unique to themselves.

What is a HRTF?
If you close your eyes and someone jangles keys behind your head, you will be able to identify the general location of the keys just from the sound you hear. A HRTF is a mathematical function that captures these transformations, and can be used to recreate the sound of those keys in a pair of headphones – so that it appears that the sound recording of the keys has a direction associated with it. However, everyone has vastly different ear and head shapes, therefore HRTFs are unique to each person. The objective of our work was to determine how the accuracy of sound localization in a VR world varies for different users, and how we can improve it.

Test procedure
In our tests, volunteers entered a VR world, which was essentially an empty room, and an invisible sound source made a short bursts of noise at various positions in the room. Volunteers were asked to point to the location of the sound source, and results were captured using the VR’s motion tracking system. Results were captured to the nearest millimeter. We tested three cases: 1) where volunteers were not allowed to move their head to assist in the localization, 2) where some slight head movements were allowed to assist in sound localization, and 3) where volunteers could turn around freely and ‘search’ (with their ears) for the sound source. The head movement was tracked by using the VR system to track the volunteer’s eye movement, and if the volunteer moved, the sound source was switched off.

Results
We observed that the accuracy with which volunteers were able to localize the sound source varied significantly from person to person. There was significant error when volunteers’ head movements were restricted, but the accuracy significantly improved when people were able to move around and listen to the sound source. This suggests that the initial impression of a sounds location in a VR world is refined when the user can move their head to refine their search.

Future Work
We are currently analyzing our results in more detail to account for the different characteristics of each user (e.g. head size, size and shape of ear, etc). Further, we are aiming to develop the experimental methodology to use machine learning algorithms enabling each user to create a pseudo-personalized HRTF, which would improve the immersive experience for all VR users.

5aSP2 – Two-dimensional high-resolution acoustic localization of distributed coherent sources for structural health monitoring

Tyler J. Flynn (t.jayflynn@gmail.com),
David R. Dowling (drd@umich.edu)

University of Michigan
Mechanical Engineering Dept.
Ann Arbor, MI 48109

Popular version of paper 5aSP2 “Two-dimensional high-resolution acoustic localization of distributed coherent sources for structural health monitoring”
Presented Friday morning, 9 November 2018 9:15-9:30am Rattenbury A/B
176th ASA Meeting Victoria, BC

When in use, many structures – like driveshafts, windmill blades, ship hulls, etc. – tend to vibrate, casting pressure waves (aka sound) into the surrounding environment. When worn or damaged, these systems may vibrate differently, resulting in measurable changes to the broadcast sound. This presents an opportunity for the enterprising acoustician: could you monitor systems, and even locate structural defects, at a distance by exploiting acoustic changes? Such a technique would surely be useful for structures that are difficult to reach or that are in challenging environments, like ships in the ocean – though these benefits would come at the cost of the added complexity to measure sound precisely. This work shows that yes, it is possible to localize defects using only acoustic measurements, and such a technique is validated with two proof-of-concept experiments.

In cases where damage affects how a structure vibrates locally (e.g. near the defect), localizing the damage reduces to finding out where the source of the sound is changing. The most common method for figuring out where sound is coming from is known as beamforming. Put simply, beamforming involves listening for sounds at different points in space (using multiple microphones known as an array) then looking for relative time delays between microphones to back out the direction(s) of the incident sound. This presents two distinct challenges for locating defects: 1) the acoustic changes from a defect are pretty small compared to all the sound being generated, so they can easily get ‘washed out’. This can be addressed by using previously recorded measurements of the undamaged structure, then subtracting these recordings in a special way such that the differences between the damaged and undamaged structures are localized. Even then, more advanced high-resolution beamforming techniques are needed to precisely pinpoint changes. This leads to the second challenge, 2) Sound emitted from vibrating structures is typically coherent (meaning that sounds coming from different directions are strongly related) and this causes problems for high-resolution beamforming. However, a trick can be used wherein the full array of microphones is divided into smaller subarrays that can then be averaged in a special way to side-step the coherence problem.

acoustic localization

Figure 1: Experimental setups. The square microphone array sitting above a single speaker source (top left). The microphone array sitting above the clamped aluminum plate that is vibrated from below (right). A close-up of the square microphone array (bottom left).

Two validation experiments were conducted. In the first, an 8×8 array of 64 microphones was used to record 5kHz pulses from small loudspeakers at various locations on the floor (Figure 1). With three speaker sources in an arbitrary configuration, a recording was made. The volume of one source was then reduced 20% and another measurement was made. Using the described method (with the 8×8 array subdivided and averaged over 25 4×4 subarrays) the 20% change was precisely located with great agreement to computer simulations of the experiment (Figure 2). To test for actual damage, in the second experiment, a 3.8-cm cut was added to a 30-cm-square aluminum plate. The plate, vibrated from below to induce sound, was recorded from above, with and without the cut. Once again using the special method described here, the change, i.e. the cut was successfully found (Figure 3) – a promising result for practical applications of the technique.

Figure 2: Results of the first experiment. The top row of images uses the proposed technique, while the bottom uses a conventional technique. A ‘subtraction’ between the two very similar acoustic measurements (far, center left) allows for precise localization of the 20% change (center right) and great agreement with simulated results (far right).

Figure 3: Results of the second experiment. The two left images show vibrational measurement of the plate (vibrated around 830 Hz) with and without the added cut, showing that the cut noticeably affects the vibration. The right image shows high-resolution acoustic localization of the cut using the described technique (at 3600 Hz).

1aSP2 – Propagation effects on acoustic particle velocity sensing

Sandra L. Collier – sandra.l.collier4.civ@mail.mil, Max F. Denis, David A. Ligon, Latasha I. Solomon, John M. Noble, W.C. Kirkpatrick Alberts, II, Leng K. Sim, Christian G. Reiff, Deryck D. James
U.S. Army Research Laboratory
2800 Powder Mill Rd
Adelphi, MD 20783-1138

Madeline M. Erikson
U.S. Military Academy
West Point, NY

Popular version of paper 1aSP2, “Propagation effects on acoustic particle velocity sensing”
Presented Monday morning, 7 May 2018, 9:20-9:40 AM, Greenway H/I
175th ASA Meeting Minneapolis, MN

Left: time series of the recorded particle velocity amplitude versus time for propane cannon shots. Right: corresponding spectrogram. Upper: 100 m; lower 400 m.

As a sound wave travels through the atmosphere, it may scatter from atmospheric turbulence. Energy is lost from the forward moving wave, and the once smooth wavefront may have tiny ripples in it if there is weak scattering, or large distortions if there is strong scattering. A significant amount of research has studied the effects of atmospheric turbulence on the sound wave’s pressure field. Past studies of the pressure field have found that strong scattering occurs when there are large turbulence fluctuations and/or the propagation range is long, both with respect to wavelength. This scattering regime is referred to as fully saturated. In the unsaturated regime, there is weak scattering and the atmospheric turbulence fluctuations and/or propagation distance are small with respect to the wavelength. The transition between the two regimes is referred to as partially saturated.

Usually, when people think of a sound wave, they think of the pressure field, after all, human ears are sophisticated pressure sensors. Microphones are pressure sensors. But a sound wave is a mechanical wave described not only by its pressure field, but also by its particle velocity. The objective of our research is to examine the effects of atmospheric turbulence on the particle velocity. Particle velocity sensors (sometimes referred to as vector sensors) in the air are relatively new, and as such, atmospheric turbulence studies have not been conducted before. We do this statistically, as the atmosphere is a random medium.  This means that every time a sound wave propagates, there may be a different outcome – a different path, a change in phase, a change in amplitude. The probability distribution function describes the set of possible outcomes.

The cover picture illustrates a typical transient broadband event (propane cannon) recorded 100 m (upper plots) away from the source. The time series on the left is the recorded particle velocity versus time. The spectrogram on the right is a visualization of the frequency and intensity of the wave through time. The sharp vertical lines across all frequencies are the propane cannon shots. We also see other noise sources: a passing airplane (between 0 and 0.5 minutes) and noise from power lines (horizontal lines). The same shots recorded at the 400 m are shown in the lower plots. We notice right away there are the numerous vertical lines – most probably due to wind noise. Since the sensor is further away, the amplitude of the sound is reduced, the higher frequencies have attenuated, and the signal-to-noise ratio is lower.

The atmospheric conditions (low wind speeds, warm temperatures) led to convectively driven turbulence described by a von Kármán spectrum. Statistically, we found that the particle velocity had similar probability distributions to previous observations of the pressure field with similar atmospheric conditions: unsaturated regime is observed for lower frequencies and shorter ranges; and the saturated regime is observed for higher frequencies and longer ranges. In the figure below (left), the unsaturated regime is seen as a tight collection of points, with little variation in phase (angle along the circle) or amplitude (distance from the center). The beginning of the transition into the partially saturated regime has very little amplitude fluctuations and small phase fluctuations, and the set of observations has the shape of a comma (middle). The saturated regime is when there are large variations in the amplitude and phase, and the set of observations appears to be fully randomized – points everywhere (right).

Scatter plots of the particle velocity for observations over two days (blue – day 1; green – day 2).  From left to right, the scatter plots depict the unsaturated regime, partially saturated regime, and saturated regime.

The propagation environment has numerous other states that we also need to study to have a more complete picture. It is standard practice to benchmark the performance of different microphones, so as to determine sensor limitations and optimal operating conditions.  Similar studies should be done for vector field sensors once new instrumentation is available.  Vector sensors are of importance to the U.S. Army for the detection, localization, and tracking of potential threats in order to provide situational understanding and potentially life-saving technology to our soldiers. The particle velocity sensor we used was just bigger than a pencil. Including the windscreen, it was about a foot in diameter. Compare that to a microphone array that could be meters in size to accomplish the same thing.

Bibliography

  1. Cheinet, M. Cosnefroy, D.K. Wilson, V.E. Ostashev, S.L. Collier and J.E. Cain, “Effets de la turbulence sur des impulsions acoustiques propageant près du sol (Effects of turbulence on acoustic impulses propagating near the ground),” Congrès Français d’Acoustique (French Congress of Acoustics), 11-15 April 2016, Le Mans, France.
  2. Ehrhardt, S. Cheinet, D. Juvé and P. Blanc-Benon, “Evaluating a linearized Euler equations model for strong turbulence effects on sound propagation,” J. Acoust. Soc. Am., 133, 1922-1933 (2013).
  3. L. Collier, “Fisher Information for a Complex Gaussian Random Variable: Beamforming Applications for Wave Propagation in a Random Medium,” IEEE Trans. Sig. Proc. 53, 4236-4248 (2005).
  4. E. Norris, D.K. Wilson and D.W. Thomson, “Correlations Between Acoustic Travel-Time Fluctuations and Turbulence in the Atmospheric Surface Layer,” Acta Acust. Acust., 87, 677-684 (2001).

Acknowledgement:
This research was supported in part by an appointment to the U.S. Army Research Laboratory
Research Associateship Program administered by Oak Ridge Associated Universities.

1pSP10 – Design of an Unmanned Aerial Vehicle Based on Acoustic Navigation Algorithm

Yunmeng Gong1 – (476793382@qq.com)
Huping Xu1 – (hupingxu@126.com)
Yu Hen Hu2 – (yuhen.hu@wisc.edu)
1 School of Logistic Engineering
Wuhan University of Technology
Wuhan, Hubei, China 430063
2Department of Electrical and Computer Engineering
University of Wisconsin – Madison
Madison, WI 53706 USA

Popular version of paper 1pSP10, Design of an Unmanned Aerial Vehicle Based on Acoustic Navigation Algorithm”
Presented on Monday afternoon, December 4, 2017, 4:05-4:20 PM, Salon D
174th ASA Meeting, New Orleans

Acoustic UAV guidance is an enabling technology for future urban UAV transportation systems. When large numbers of commercial UAVs are tasked to deliver goods and services in a metropolitan area, they need to be guided to travel orderly along aerial corridors above streets. They will need to land or take off from designated parking structure and obey “traffic signals” to mitigate potential collisions.

An UAV acoustic guidance system consists of a group of ground stations distributed over the operating region. When the UAV is entering the system, the UAV’s fly path will be under the guidance of a regional air-traffic controller system. The UAV and the controller will communicate via radio channel using wifi or 5G cellular network internet of things protocols. The UAV’s position will be estimated through estimation of the DoA angles of narrow band acoustic signals.

a)acoustic navigation b)acoustic navigation

Figure 1 UAV acoustic guidance system (a) passive mode acoustic guidance system (b) active model acoustic guidance system

As shown in Figure 1, acoustic UAV guidance can operate in a passive self-guidance mode as well as an active guidance mode. In the passive self-guidance mode, beacons with known 3D positions will emit known, distinct narrow-band (harmonic) signals. A UAV will passively receive acoustic signals using an on-board microphone phase array. It will use the acoustic signals so sampled to estimate the direction-of-arrival (DoA) of each beacon harmonic signal. If the UAV is provided with the beacon stations’ 3D coordinates, the UAV will be able to determine its own locations and heading complement those estimated using GPS or inertial guidance systems. The advantage of the passive guidance system is that multiple UAVs can use the same group of beacon stations to estimate their own position. The technical challenge is that each UAV will be mounted with a bulky acoustic phase array; and the received acoustic signal will suffer from strong noise interference due to engine, propeller/rotor, and wind.

Conversely, in an active guidance mode, the UAV will actively emit an omni-directionally transmitted, narrow-band acoustic signal using a harmonic frequency designated by the local air-traffic controller. Each beacon station will use its local acoustic micro-phone phase array to estimate the DoA of the UAV acoustic signal. The UAV’s location, speed, and heading then will be estimated by the local air-traffic controller and transmitted to the UAV. The advantage of the active guidance mode is that the UAV has a lighter payload which consists of an amplified speaker and related circuitry. The disadvantage of this approach is that each UAV within the region needs to be able to generate harmonic signals with a distinct center frequency. As the number of UAVs within the region increases, available acoustic frequencies may be insufficient.

In this paper, we investigate key issues relating to the design and implementation of a passive mode acoustic guidance system. We ask fundamental questions such as what is the effective range of applying acoustic guidance? What are sizes and configurations of the on-board phase array? What is an efficient formulation of a direction of arrival estimation algorithm so that it can be implemented on the computers on-board a UAV?

We conducted on-the-ground experiment and found the sound attenuation as a function of distance and harmonic frequency. The result is shown in Figure 2 below.

Figure 2. Sound attenuation in air as a function of distance for different harmonic frequencies

Using a commercial UAV (DJI Phantom model), we conduct experiments to study the frequency spectrum of sound at different motion states to identify beacon frequencies that may be least interfered by engine sound and noise. An example of the acoustic spectrum during taking off is shown in Figure 3 below.

Figure 3. UAV acoustic noise during take-off

We also developed a simplified direction of arrival estimation algorithm that achieves encouraging accuracy while implemented using a STM32F407 micro-controller that can easily be installed on a UAV.

2pSP6 – Directive and focused acoustic wave radiation by tessellated transducers with folded curvatures

Ryan L. Harne*: harne.3@osu.edu
Danielle T. Lynd: lynd.47@osu.edu
Chengzhe Zou: zou.258@osu.edu
Joseph Crump: crump.1@themetroschool.org
201 W 19th Ave., N350 Scott Lab, Department of Mechanical and Aerospace Engineering, The Ohio State University, Columbus, OH 43210, USA
* Corresponding author

Popular version of paper 2pSP6 presented Mon afternoon, 26 June 2017
173rd ASA Meeting, Boston, Massachusetts, USA

Directed or focused acoustic wave energies are central to many applications, broadly ranging from ultrasound imaging to underwater ecosystem monitoring and to voice and music projection. The interference patterns necessary to realizing such directed or focused waves, guiding the radiated acoustic energy from transducer arrays to locations in space, requires close control over contributions of sound provided from each transducer source. Recent research has revealed advantages of mechanically reconfiguring acoustic transducer constituents along the folding patterns of an origami-inspired tessellation, as opposed to digitally processing signals sent to each element in a fixed configuration [1] [2] [3].

Video: Origami-inspired acoustic solutions. Credit: Harne/Lynd/Zou/Crump

One such proof-of-concept for a foldable, tessellated array of acoustic transducers is shown in Figure 1. We cut a folding pattern into piezoelectric PVDF (type of plastic) film, which is then bonded to a polypropylene plastic substrate scored with the same folding pattern. Rather than control each constituent of the array, as in digital signal processing methods, the singular driving of the whole array and the mechanical reconfiguration of the array by the folding pattern leads to comparable means to guide the acoustic wave energies.

tessellated transducers

Figure 1. Folding pattern for the array, where blue are mountain folds and red are valley folds. The laser cut PVDF is bonded to polypropylene to result in the final proof-of-concept tessellated array prototype shown at right. The baffle fixture is needed to maintain the curvature and fixed-edge boundary conditions during experiments. Credit: Harne/Lynd/Zou/Crump

To date, this concept of foldable, tessellated arrays has exemplified that the transmission of sound in angularly narrow beams, referred to technically as the directionality far field wave radiation, can be adapted by orders of magnitude when the array constituents are driven by the same signal. These arrays can be adapted up to a point dictated by the foldings of a Miura-ori style of tessellated array.

Our research investigates a new form of adaptive acoustic energy delivery from foldable arrays by studying tessellated transducers that adopt folded curvatures, thus introducing opportunity for near field energy focusing alongside the far field directionality.

For instance, Fig. 1 reveals the curvature of the proof-of-concept array of star-shaped transducer components for the partially folded state. This suggests that the array will focus sound energy to a location near the radius of curvature. The outcomes of these computational and experimental efforts find that foldable, tessellated transducers that curve upon folding offer straightforward means for the fine, real-time control needed to beam and focus sound to specific points in space.

Due to the numerous applications of acoustic wave guiding, these concepts could enhance the versatility and multifunctionality of acoustic arrays by a more straightforward mechanical reconfiguration approach that controls the radiated or received wave field. Alternatively, by strategically integrating with digital signal processing methods, future studies might uncover new synergies of performance capabilities by using actively controlled, origami-inspired acoustic arrays.

References

[1] R.L. Harne, D.T. Lynd, Origami acoustics: using principles of folding structural acoustics for simple and large focusing of sound energy, Smart Materials and Structures 25, 085031 (2016).
[2] D.T. Lynd, R.L. Harne, Strategies to predict radiated sound fields from foldable, Miura-ori-based transducers for acoustic beamfolding, The Journal of the Acoustical Society of America 141, 480-489 (2017).
[3] C. Zou, R.L. Harne, Adaptive acoustic energy delivery to near and far fields using foldable, tessellated star transducers, Smart Materials and Structures 26, 055021 (2017).

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
Rsadegh1@binghamton.edu
SUNY at Binghamton
Binghamton, NY

Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

(a) (b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).  

Sadeghian 2_graphs - Dementia

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.