3aPPa3 – When cognitive demand increases, does the right ear have an advantage? – Danielle Sacchinell

When cognitive demand increases, does the right ear have an advantage?

Danielle Sacchinelli  -dms0043@auburn.edu
Aurora J. Weaver – ajw0055@auburn.edu
Martha W. Wilson – paxtomw@auburn.edu
Anne Rankin Cannon- arc0073@auburn.edu
Auburn University
1199 Haley Center
Auburn, AL 36849

 

Popular version of paper 3aPPa3, “Does the right ear advantage persist in mature auditory systems when cognitive demand for processing increases?”

Presented Wednesday morning, December 6, 2017, 8:00-12:00 AM, Studios Foyer

174nd ASA Meeting, New Orleans

 

A dichotic listening task presents two different sound sequences simultaneously to both ears. Performance on these tasks measures selective auditory attention for each ear, either binaural separation or binaural integration (see Figure 1 for examples). Based on the anatomical model of auditory processing, the right ear has a slight advantage, compared with the left ear, on dichotic listening tasks. This is due to left brain hemispheric dominance for language, which receives direct auditory input from the right ear (i.e., strong contralateral auditory pathway; Kimura, 1967).

Clinical tests of auditory function quantify this right ear advantage for dichotic listening tasks to assess maturity of the auditory system, in addition to other clinical implications. Accurate performance on dichotic tests relies on both sensory organization and memory. As a child matures, the right ear advantage decreases until it is no longer clinically significant. However, clinically available dichotic-digits tests use only 1, 2, (e.g., Dichotic digits test; Musiek, 1983; Musiek, et al., 1991) or 3 (i.e., Dichotic DigitsMAPA; Schow, Seikel, Brockett, & Whitaker, 2007) digit sets in each ear for testing. See Figure 1 for maximum task demands of clinical tests for binaural integration, instructions “B”, using free recall protocol (Guenette, 2006).

Daily listening often requires an adult to process competing information that extends six items of sensory input. This study investigated the impact of increasing cognitive demands on ear performance asymmetries (i.e., right versus left) in mature auditory systems. Forty-two participants (i.e., 19-28 year-olds) performed dichotic binaural separation tasks (adapted from the Dspan Task; Nagaraj, 2017), for 2, 3, 4, 5, 6, 7, 8, and 9-digit lists. Listeners recalled the sequence presented to one ear while ignoring the sequence presented to the opposite ear (i.e., binaural separation; directed ear protocol). See Figure 1 for an example of the experimental binaural separation tasks (i.e., digit length = 3 used for condition 2) and instructions “A” for directed ear recall.

Results in Figure 2 show a significant effect for directed ear performance as task demands increase (i.e., digit list length). The overall evaluation of the list length (Figure 2) does not reveal the impact of working memory capacity limits (i.e., maximum items that can be recalled for an ongoing task) for each participant. Therefore, a digit span was measured to estimate each participant’s simple working memory capacity. Planned comparisons for ear performance relative to a participant’s digit span (i.e., below = n-2, at span = n, and above span = n+2 digit lists, where n = digit span) evaluated the role of cognitive demand on ear asymmetries.

Planned t-test comparisons revealed a significant performance asymmetry above span (i.e., n+2). No significant differences were identified for performance relative to, or below, an individual’s simple memory capacity. This indicates the persistence of the right ear advantage in mature auditory systems when listening demands exceeded an individual’s auditory memory capacity.

Overall, the study found the right ear continues to show better performance on dichotic listening tasks, even in mature adults. This persistent right ear advantage occurred when the number of digits in the sequence exceeded the participants’ digit span capacity. We believe such demands are a realistic aspect of every day listening, as individuals attempt to retain sensory information in demanding listening environments. These results may help us modify existing clinical tests, or develop a new task, to more precisely reveal performance asymmetries based on an individual’s auditory working memory capacity.

Figure 1. Displays an example of dichotic digit stimuli presentation, with both “A” binaural separation tasks (i.e., directed ear) and “B” binaural integration (i.e., free recall) instructions.

Figure 2. Displays ear performance on the binaural separation task across all participants. Note: the orange box highlights the maximum demands of commercially available dichotic-digits tests; participant performance reflects a lack of asymmetry under these cognitive demands.

Figure 3. Displays participant ear performance on the binaural separation task relative to digit span.

 

  1. Kimura, D. (1967). Functional asymmetry of the brain in dichotic listening. Cortex, 3(2), 163- 176.
  2. Musiek, F., (1983). Assessment of central auditory dysfunction: The dichotic digit test revisited. Ear and Hearing, 4(2), 79-83.
  3. Musiek, F., Gollegly, K., Kibbe, K., & Verkest-Lenz, S. (1991). Proposed screening test for central auditory disorders: Follow-up on the dichotic digits test. The AmericanJournal of Otology, 12:2, 109-113.
  4. Schow, R., Seikel, A., Brockett, J., Whitaker, M., (2007). Multiple Auditory Processing Assessment (MAPA); Test Manual 1.0 version. AUDITEC, St. Louis, MO. PDF available from http://www2.isu.edu/csed/audiology/mapa/MAPA_Manual.pdf
  5. Guenette, L.A. (2006). How to administer the Dichotic Digit Test. The Hearing Journal, 59 (2), 50.
  6. Nagaraj, N. K. (2017). Working Memory and Speech Comprehension in Older Adults with Hearing Impairment. Journal of Speech Language and Hearing Research, 60(10), 2949-2964. doi: 10.1044/2017_JSLHR-H-17-0022.

2pBA3 – Semi-Automated Smart Detection of Prostate Cancer using Machine Learning and a Novel Near-Microscopic Imaging Platform – Daniel Rohrbach

Semi-Automated Smart Detection of Prostate Cancer using
Machine Learning and a Novel Near-Microscopic Imaging Platform

Daniel Rohrbach- drohrbach@RiversideResearch.org , Jonathan Mamou and Ernest Feleppa
Lizzi Center for Biomedical Engineering, Riverside Research
New York, NY, USA, 10038

Brian Wodlinger and Jerrold Wen
Exact Imaging, Markham
Ontario, Canada, L3R 2N2

 

Popular version of paper 2pBA3, “Quantitative-ultrasound-based prostate-cancer imaging by means of a novel micro-ultrasound scanner”

Presented Tuesday, December 05, 2017, 1:45-2:00 PM, Balcony M

174th ASA meeting, New Orleans

 

Prostate cancer is the second-leading cause of male cancer-related death in the U.S. with approximately 1 in 7 men being diagnosed with prostate cancer during their lifetime[i].  Detection and diagnosis of this significant disease presents a major clinical challenge because the current standard-of-care imaging method, conventional transrectal ultrasound, cannot reliably distinguish cancerous from non-cancerous prostate tissue.  Therefore, prostate biopsies for definitively diagnosing cancer are currently delivered in a systematic but “blind” pattern.  Other imaging methods, such as MRI, have been investigated for guiding biopsies, but MRI involves complicated procedures, is costly, is poorly tolerated by most patients, and  demonstrates significant variability among clinical sites and practitioners.  Our study investigated sophisticated tissue-typing algorithms for possible use in a novel, fine-resolution, ultrasound instrument called the ExactVu™ micro-ultrasound instrument by Exact Imaging, Markham, Ontario.  The ExactVu recently has been approved for commercial sale in North America and Europe.  The term micro-ultrasound refers to the near-microscopic resolution of the device.  This new, fine-resolution instrument allows clinicians to visualize previously unseen features of the prostate in real time and enables them to differentiate suspicious regions of the prostate so that they can “target” biopsies to those suspicious regions.  To enable more-objective interpretation of tissue features made visible by the ExactVu, a cancer-risk-identification protocol – called PRI-MUS™ (prostate risk Identification using micro-ultrasound)[ii] – has been developed and validated to distinguish benign prostate tissue from tissue that has a high probability of being cancerous based on its appearance in a micro-ultrasound image.

The paper, “High-frequency quantitative ultrasound for prostate-cancer imaging using a novel micro-ultrasound scanner, which is being presented at the 174th Acoustical Society of America, shows promising results from a collaborative research project undertaken by Riverside Research, a leading biomedical research institution in New York, NY, and Exact Imaging.  The paper describes an approach that successfully applies a combination of (1) sophisticated ultrasound signal processing methods known as quantitative ultrasound and (2) machine-learning and artificial intelligence to analysis of fine-resolution data acquired with the novel micro-ultrasound imaging platform to automate detection of cancerous tissue in the prostate.  Results of the study were very encouraging and showed a promising ability of the methods to distinguish cancerous from non-cancerous prostate tissue.

A database of 12,000 fine-resolution, micro-ultrasound images and correlated biopsy histology has been developed.  The new algorithm for automated detection continues to evolve and is applied to this growing data set.

Future clinical application of the algorithms implemented in the ExactVu would involve scanning a patient with indications of prostate cancer (e.g., as a result of a transrectal palpation or a high level of prostate-specific antigen in the blood) to identify regions of the gland that are sufficiently suspicious for cancer to warrant a biopsy.  As the scan proceeds, the algorithm continuously analyzes the ultrasound signals and automatically indicates to the examining urologist any regions that have a significant risk of being cancerous.  The urologist evaluates the indicated region and makes a clinical judgement on whether the region in fact warrants a biopsy.

The results of this study show an encouraging ability of ultrasound-signal processing and the machine-learning algorithm together with the novel micro-ultrasound instrumentation to depict regions of the prostate that are cancerous with high reliability.  The study demonstrates a promising potential of the algorithms and micro-ultrasound to improve targeting of biopsies, to increase cancer-detection rates, to avoid unnecessary biopsies and associated risks, to support focal therapy more effectively, and consequently to achieve better patient outcomes.

 

Referenced abstract:
High-frequency quantitative ultrasound for prostate-cancer imaging using a novel micro-ultrasound scanner

[i] American Cancer Association: https://cancerstatisticscenter.cancer.org/?_ga=2.177940773.1025752599.1511161127-1043893878.1511161127#!/

[ii] Ghai S, et al: Assessing Cancer Risk on Novel 29 MHz Micro-Ultrasound Images of the Prostate: Creation of the Micro-Ultrasound Protocol for Prostate Risk Identification. J. Urol. 2016; 196: 562–569.

1pAO9 – The Acoustic Properties of Crude Oil – Scott Loranger

The Acoustic Properties of Crude Oil

Scott Loranger – sloranger@ccom.unh.edu
Department of Earth Science
University of New Hampshire
Durham, NH, United States

Christopher Bassett – chris.bassett@noaa.gov
Alaska Fisheries Science Center
National Marine Fisheries Service
Seattle, WA, United States

Justin P. Cole – jpq68@wildcats.unh.edu
Department of Chemistry
Colorado State University
Fort Collins, CO, United States

Thomas C. Weber – Weber@ccom.unh.edu
Department of Mechanical Engineering
University of New Hampshire
Durham, NH, United States

 

Popular version of paper 1pAO9, “The Acoustic Properties of Three Crude Oils at Oceanographically Relevant Temperatures and Pressures”

Presented Monday afternoon, December 04, 2017, 3:35-3:50 PM, Balcony M

174th ASA Meeting, New Orleans, LA

 

The difficulty of detecting and quantifying oil in the ocean has limited our understanding of the fate and environmental impact of oil from natural seeps and man-made spills. Oil on the surface can be detected by satellite (figure 1) and studied with optical instrumentation, however, as researchers look deeper to study oil as it rises through the water column, the rapid attenuation of light in the ocean limits the usefulness of these systems. Active sonar – where an acoustic transmitter generates a pulse of sound and a receiver listens for the sound reflected from an object – takes advantage of the low attenuation of sound in the ocean to detect things farther away than optical instruments. However, oil is difficult to detect acoustically because oil and seawater have similar physical properties. The amount of sound reflected from an object is dependent to the object’s size, shape and a physical property called the acoustic impedance – the product of the density and sound speed of the material being measured. When an object has an acoustic impedance similar to the medium that surrounds it, the object reflects relatively little sound. The acoustic impedance of oil (which differs by type of oil) and sea water is often very similar. In fact, under certain conditions oil droplets could be acoustically invisible. To study oil acoustically, we need to better understand the physical properties that affect its acoustic impedance.

Most measurements of the density and sound speed of oil come from oil exploration research which focuses on studying oil under reservoir conditions – high temperatures and pressures associated with oil deep underground.  As oil cools to oceanographically relevant temperatures it can transition from a liquid to a waxy semisolid. This transition may result in significant changes to the acoustic properties of oil which would not be predicted by measurements made at reservoir conditions. To inform models of acoustic scattering from oil and produce quantitatively meaningful measurements it is necessary to have well-understood properties at relevant temperatures and pressures. Density and sound speed can be measured directly, while the shape of an oil droplet can be predicted from the density and viscosity. Density and viscosity will tell you how quickly a droplet will rise, and how the drag force of the surrounding water will modify its shape. Droplets can range from spheres to more pancake like shapes that one could produce by pushing down on an inflated balloon.

To better understand these important properties, we obtained samples of three different crude oils. Each sample was sent for “fingerprinting” to identify differences in the molecular composition of the oils. “Fingerprinting” is a technique used by oil exploration scientists and spill responders to identify different crude oils. Measurements of the sound speed, density, and viscosity were made from -10°C (14°F) to 30°C (86°F). A sound speed chamber was specifically designed to measure sound speed at the same temperature range but with the added effects of pressure (0 to 2500 psi – equivalent to approximately 1700 m depth, deeper than the Deepwater Horizon well).

Light, medium and heavy crude oil was tested. Each of these is typically defined by their American Petroleum Institute (API) gravity. API gravity is a common descriptor of oils and is a measure of the density of oil relative to water. The properties of the medium and heavy crude oil are in the figure (2) below. The sound speed is different both in amplitude and shape, while the viscosity only differs in amplitude, suggesting that the changes to shape of the sound speed curve may not be related to the viscosity. The heavy oil is currently limited to measurements above 5°C because below that temperature it becomes very difficult to transfer sound through the oil. Part of this ongoing research is to develop new techniques to measure sound speed, and to use these techniques to extend our measurements of heavy oils to cold temperatures similar to those found in Arctic regions where oil can be trapped in ice. By better understanding these physical properties of oil, the methods and models used to detect and quantify oil in the marine environment can be improved.

Figure 1: Satellite image of surface oil slicks from natural seeps.

Figure 2: Experimental measurements of the physical properties of a Medium and Heavy crude oil.

4pSC11 – The role of familiarity in audiovisual speech perception – Chao-Yang Lee

The role of familiarity in audiovisual speech perception

Chao-Yang Lee – leec1@ohio.edu
Margaret Harrison – mh806711@ohio.edu
Ohio University
Grover Center W225
Athens, OH 45701

Seth Wiener – sethw1@cmu.edu
Carnegie Mellon University
160 Baker Hall, 5000 Forbes Avenue
Pittsburgh, PA 15213

 

Popular version of paper 4pSC11, “The role of familiarity in audiovisual speech perception”

Presented Thursday afternoon, December 7, 2017, 1:00-4:00 PM, Studios Foyer

174th ASA Meeting, New Orleans

 

When we listen to someone talk, we hear not only the content of the spoken message, but also the speaker’s voice carrying the message. Although understanding content does not require identifying a specific speaker’s voice, familiarity with a speaker has been shown to facilitate speech perception (Nygaard & Pisoni, 1998) and spoken word recognition (Lee & Zhang, 2017).

Because we often communicate with a visible speaker, what we hear is also affected by what we see. This is famously demonstrated by the McGurk effect (McGurk & MacDonald, 1976). For example, an auditory “ba” paired with a visual “ga” usually elicits a perceived “da” that is not present in the auditory or the visual input.

Since familiarity with a speaker’s voice affects auditory perception, does familiarity with a speaker’s face similarly affect audiovisual perception? Walker, Bruce, and O’Malley (1995) found that familiarity with a speaker reduced the occurrence of the McGurk effect. This finding supports the “unity” assumption of intersensory integration (Welch & Warren, 1980), but challenges the proposal that processing facial speech is independent of processing facial identity (Bruce & Young, 1986; Green, Kuhl, Meltzoff, & Stevens, 1991).

In this study, we explored audiovisual speech perception by investigating how familiarity with a speaker affects the perception of English fricatives “s” and “sh”. These two sounds are useful because they contrast visibly in lip rounding. In particular, the lips are usually protruded for “sh” but not “s”, meaning listeners can potentially identify the contrast based on visual information.

Listeners were asked to watch/listen to stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent (e.g., audio “save” paired with visual “shave”). The listeners’ task was to identify whether the first sound of the stimuli was “s” or “sh”. We tested two groups of native English listeners – one familiar with the speaker who produced the stimuli and one unfamiliar with the speaker.

The results showed that listeners familiar with the speaker identified the fricatives faster in all conditions (Figure 1) and more accurately in the visual-only condition (Figure 2). That is, listeners familiar with the speaker were more efficient in identifying the fricatives overall, and were more accurate when visual input was the only source of information.

We also examined whether visual familiarity affects the occurrence of the McGurk effect. Listeners were asked to identify syllable-initial stops (“b”, “d”, “g”) from stimuli that were audiovisual-congruent or incongruent (e.g., audio “ba” paired with visual “ga”). A blended (McGurk) response was indicated by a “da” response to an auditory “ba” paired with a visual “ga”.

Contrary to the “s”-“sh” findings reported earlier, the results from our identification task showed no difference between the familiar and unfamiliar listeners in the proportion of McGurk responses. This finding did not replicate Walker, Bruce, and O’Malley (1995).

In sum, familiarity with a speaker facilitated the speed of identifying fricatives from audiovisual stimuli. Familiarity also improved the accuracy of fricative identification when visual input was the only source of information. Although we did not find an effect of familiarity on the McGurk responses, our findings from the fricative task suggest that processing audiovisual speech is affected by speaker identity.

Figure 1- Reaction time of fricative identification from stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent. Error bars indicate 95% confidence intervals.

 

Figure 2- Accuracy of fricative identification (d’) from stimuli that were audio-only, visual-only, audiovisual-congruent, or audiovisual-incongruent (e.g., audio “save” paired with visual “shave”). Error bars indicate 95% confidence intervals.

Figure 3- Proportion of McGurk response (“da” response to audio “ba” paired with visual “ga”).

 

Video 1- Example of an audiovisual-incongruent stimulus (audio “save” paired with visual “shave”).

 

Video 2- Example of an audiovisual-incongruent stimulus (audio “ba” paired with visual “ga”).

 

References:

 

Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327.

Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B. (1991). Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics, 50, 524-536.

Lee, C.-Y., & Zhang, Y. (in press). Processing lexical and speaker information in repetition and semantic/associative priming. Journal of Psycholinguistic Research.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 26, 746-748.

Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355-376.

Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception and Psychophysics, 57, 1124-1133.

Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638-667.

 

 

4aPP4 – Listener’s body rotation and sound duration improve sound localization accuracy or not? – Akio Honda

Listener’s body rotation and sound duration improve sound localization accuracy or not?

Akio Honda- honda@yamanash-eiwa.ac.jp
Yamanashi-Eiwa College
888 Yokone-machi, Kofu
Yamanashi, Japan 400-8555

 

Popular version of paper 4aPP4, “Effects of listener’s whole-body rotation and sound duration on sound localization accuracy”

Presented Thursday morning, December 7, 2017, 9:45-10:00 AM, Studio 4

174th ASA Meeting, New Orleans

 

Sound localization is an important ability to make daily life safe and rich. When trying to localize a sound, our head/body movement is known to facilitate sound localization, which creates dynamic changes to the information input to each ear [1–4]. However, earlier reports have described that sound localization accuracy deteriorates during a listener’s head rotation [5–7]. Moreover, the facilitative effects of a listener’s movement differ depending on the sound features [3–4]. Therefore, the interaction between a listener’s movement and sound features remains unclear. For this study, we used a digitally controlled spinning chair to assess the effects of a listener’s whole-body rotation and sound duration on horizontal sound localization accuracy.

In this experiment, listeners were 12 adults with normal audition. Stimuli were 1/3-octave band noise bursts (center frequency = 1 kHz, SPL = 65 dB) of 50, 200, and 1000 ms duration. Each stimulus was presented from a loudspeaker in a circular array (1.2 m radius) with loudspeaker separation of 2.5 deg (total 25 loudspeakers). Listeners were unable to see the loudspeakers because an acoustically transparent curtain was placed between the listener and the circular loudspeaker array while maintaining brighter conditions inside the curtain than outside. We assigned numbers for the azimuth angle at 1.25 degree intervals: the number zero was 31.25 deg to the left; the number 25 was in front of the listener; and the number 50 was 31.25 deg to the right. These numbers were presented on the curtain to facilitate responses. Listeners sitting on the spinning chair set at the circle center were asked to report the number corresponding to the position of the presented stimulus (see Fig. 1).

In the chair-still condition, listeners faced forward with the head aligned frontward (0 deg). Then the stimulus was presented from one loudspeaker of the circular array. In the chair-rotation condition, listeners faced forward with the head 15 deg left or 15 deg right. Then, the chair rotated for 30 deg clockwise or counterclockwise respectively when the listener first faced 15 deg left or right. During the rotation, when listeners faced forward with the head front at 0 deg, the stimulus was presented from one of the loudspeakers in the circular array.

We analyzed the angular errors in the horizontal planes. The angular errors were calculated as the difference between the perceptually localized position and the physical target position. Figure 2 depicts the mean horizontal sound localization performance.

Our results demonstrated superior sound localization accuracy of the chair-rotation condition to that of a chair-still condition. Moreover, a significant effect of sound duration was observed; the accuracy for 200 ms stimuli seems worst among the durations used. However, the interaction of the test condition and the sound duration was not significant.

These findings suggest that the sound localization performance might be improved if listeners are able to obtain dynamic auditory information from their movement.  Furthermore, the duration difference of target sound was not crucially important for their sound localization accuracy. Of course, other explanations are possible. For instance, listeners might be better able to localize the sound using shorter sound (less than 50 ms), although a halfway longer duration such as 200 ms would not provide effective dynamic information to facilitate sound localization. Irrespective of the interpretation, our results provide valuable suggestions for future studies undertaken to elucidate the interaction between a listener’s movement and sound duration.

 

Fig. 1 Outline of the loudspeaker array system.

Fig. 2 Results of angular error in the horizontal planes

 

References:

  • Wallach, “On sound localization,” J. Acoust. Soc. Am., 10, 270–274 (1939).
  • Honda, H. Shibata, S. Hidaka, J. Gyoba, Y. Iwaya, and Y. Suzuki, “Effects of Head Movement and Proprioceptive Feedback in Training of Sound Localization,” i-Perception, 4, 253–264 (2013).
  • Iwaya, Y. Suzuki and D. Kimura, “Effects of head movement on front-back error in sound localization,” Acoust. Sci. Technol., 24, 322–324 (2003).
  • Perrett and W. Noble, “The contribution of head motion cues to localization of low-pass noise,” Percept. Psychophys., 59, 1018–1026 (1997).
  • Cooper, S. Carlile and D. Alais, “Distortions of auditory space during rapid head turns,” Exp. Brain. Res., 191, 209–219 (2008).
  • Leung, D. Alais and S. Carlile, “Compression of auditory space during rapid head turns,” Proc. Natl. Acad. Sci. U.S.A., 105, 6492–6497 (2008).
  • Honda, K. Ohba, Y. Iwaya, and Y. Suzuki, “Detection of sound image movement during horizontal head rotation,” i-Perception, 7, 2041669516669614 (2016).

 

1pSP10 – Design of an Unmanned Aerial Vehicle Based on Acoustic Navigation Algorithm – Yunmeng Gong

Design of an Unmanned Aerial Vehicle Based on Acoustic Navigation Algorithm

Yunmeng Gong1 – (476793382@qq.com)
Huping Xu1 – (hupingxu@126.com)
Yu Hen Hu2 – (yuhen.hu@wisc.edu)
1 School of Logistic Engineering
Wuhan University of Technology
Wuhan, Hubei, China 430063
2Department of Electrical and Computer Engineering
University of Wisconsin – Madison
Madison, WI 53706 USA

Popular version of paper 1pSP10, Design of an Unmanned Aerial Vehicle Based on Acoustic Navigation Algorithm”

Presented on Monday afternoon, December 4, 2017, 4:05-4:20 PM, Salon D

174th ASA Meeting, New Orleans

 

Acoustic UAV guidance is an enabling technology for future urban UAV transportation systems. When large numbers of commercial UAVs are tasked to deliver goods and services in a metropolitan area, they need to be guided to travel orderly along aerial corridors above streets. They will need to land or take off from designated parking structure and obey “traffic signals” to mitigate potential collisions.

An UAV acoustic guidance system consists of a group of ground stations distributed over the operating region. When the UAV is entering the system, the UAV’s fly path will be under the guidance of a regional air-traffic controller system. The UAV and the controller will communicate via radio channel using wifi or 5G cellular network internet of things protocols. The UAV’s position will be estimated through estimation of the DoA angles of narrow band acoustic signals.

As shown in Figure 1, acoustic UAV guidance can operate in a passive self-guidance mode as well as an active guidance mode. In the passive self-guidance mode, beacons with known 3D positions will emit known, distinct narrow-band (harmonic) signals. A UAV will passively receive acoustic signals using an on-board microphone phase array. It will use the acoustic signals so sampled to estimate the direction-of-arrival (DoA) of each beacon harmonic signal. If the UAV is provided with the beacon stations’ 3D coordinates, the UAV will be able to determine its own locations and heading complement those estimated using GPS or inertial guidance systems. The advantage of the passive guidance system is that multiple UAVs can use the same group of beacon stations to estimate their own position. The technical challenge is that each UAV will be mounted with a bulky acoustic phase array; and the received acoustic signal will suffer from strong noise interference due to engine, propeller/rotor, and wind.

Conversely, in an active guidance mode, the UAV will actively emit an omni-directionally transmitted, narrow-band acoustic signal using a harmonic frequency designated by the local air-traffic controller. Each beacon station will use its local acoustic micro-phone phase array to estimate the DoA of the UAV acoustic signal. The UAV’s location, speed, and heading then will be estimated by the local air-traffic controller and transmitted to the UAV. The advantage of the active guidance mode is that the UAV has a lighter payload which consists of an amplified speaker and related circuitry. The disadvantage of this approach is that each UAV within the region needs to be able to generate harmonic signals with a distinct center frequency. As the number of UAVs within the region increases, available acoustic frequencies may be insufficient.

In this paper, we investigate key issues relating to the design and implementation of a passive mode acoustic guidance system. We ask fundamental questions such as what is the effective range of applying acoustic guidance? What are sizes and configurations of the on-board phase array? What is an efficient formulation of a direction of arrival estimation algorithm so that it can be implemented on the computers on-board a UAV?

We conducted on-the-ground experiment and found the sound attenuation as a function of distance and harmonic frequency. The result is shown in Figure 2 below.

Using a commercial UAV (DJI Phantom model), we conduct experiments to study the frequency spectrum of sound at different motion states to identify beacon frequencies that may be least interfered by engine sound and noise. An example of the acoustic spectrum during taking off is shown in Figure 3 below.

We also developed a simplified direction of arrival estimation algorithm that achieves encouraging accuracy while implemented using a STM32F407 micro-controller that can easily be installed on a UAV.

(a) passive mode acoustic guidance system

(b) active model acoustic guidance system

 

Figure 1 UAV acoustic guidance system

Figure 2. Sound attenuation in air as a function of distance for different harmonic frequencies

 

Figure 3. UAV acoustic noise during take-off