1aSP2 – Propagation effects on acoustic particle velocity sensing

Sandra L. Collier – sandra.l.collier4.civ@mail.mil, Max F. Denis, David A. Ligon, Latasha I. Solomon, John M. Noble, W.C. Kirkpatrick Alberts, II, Leng K. Sim, Christian G. Reiff, Deryck D. James
U.S. Army Research Laboratory
2800 Powder Mill Rd
Adelphi, MD 20783-1138

Madeline M. Erikson
U.S. Military Academy
West Point, NY

Popular version of paper 1aSP2, “Propagation effects on acoustic particle velocity sensing”
Presented Monday morning, 7 May 2018, 9:20-9:40 AM, Greenway H/I
175th ASA Meeting Minneapolis, MN

Left: time series of the recorded particle velocity amplitude versus time for propane cannon shots. Right: corresponding spectrogram. Upper: 100 m; lower 400 m.

As a sound wave travels through the atmosphere, it may scatter from atmospheric turbulence. Energy is lost from the forward moving wave, and the once smooth wavefront may have tiny ripples in it if there is weak scattering, or large distortions if there is strong scattering. A significant amount of research has studied the effects of atmospheric turbulence on the sound wave’s pressure field. Past studies of the pressure field have found that strong scattering occurs when there are large turbulence fluctuations and/or the propagation range is long, both with respect to wavelength. This scattering regime is referred to as fully saturated. In the unsaturated regime, there is weak scattering and the atmospheric turbulence fluctuations and/or propagation distance are small with respect to the wavelength. The transition between the two regimes is referred to as partially saturated.

Usually, when people think of a sound wave, they think of the pressure field, after all, human ears are sophisticated pressure sensors. Microphones are pressure sensors. But a sound wave is a mechanical wave described not only by its pressure field, but also by its particle velocity. The objective of our research is to examine the effects of atmospheric turbulence on the particle velocity. Particle velocity sensors (sometimes referred to as vector sensors) in the air are relatively new, and as such, atmospheric turbulence studies have not been conducted before. We do this statistically, as the atmosphere is a random medium.  This means that every time a sound wave propagates, there may be a different outcome – a different path, a change in phase, a change in amplitude. The probability distribution function describes the set of possible outcomes.

The cover picture illustrates a typical transient broadband event (propane cannon) recorded 100 m (upper plots) away from the source. The time series on the left is the recorded particle velocity versus time. The spectrogram on the right is a visualization of the frequency and intensity of the wave through time. The sharp vertical lines across all frequencies are the propane cannon shots. We also see other noise sources: a passing airplane (between 0 and 0.5 minutes) and noise from power lines (horizontal lines). The same shots recorded at the 400 m are shown in the lower plots. We notice right away there are the numerous vertical lines – most probably due to wind noise. Since the sensor is further away, the amplitude of the sound is reduced, the higher frequencies have attenuated, and the signal-to-noise ratio is lower.

The atmospheric conditions (low wind speeds, warm temperatures) led to convectively driven turbulence described by a von Kármán spectrum. Statistically, we found that the particle velocity had similar probability distributions to previous observations of the pressure field with similar atmospheric conditions: unsaturated regime is observed for lower frequencies and shorter ranges; and the saturated regime is observed for higher frequencies and longer ranges. In the figure below (left), the unsaturated regime is seen as a tight collection of points, with little variation in phase (angle along the circle) or amplitude (distance from the center). The beginning of the transition into the partially saturated regime has very little amplitude fluctuations and small phase fluctuations, and the set of observations has the shape of a comma (middle). The saturated regime is when there are large variations in the amplitude and phase, and the set of observations appears to be fully randomized – points everywhere (right).

Scatter plots of the particle velocity for observations over two days (blue – day 1; green – day 2).  From left to right, the scatter plots depict the unsaturated regime, partially saturated regime, and saturated regime.

The propagation environment has numerous other states that we also need to study to have a more complete picture. It is standard practice to benchmark the performance of different microphones, so as to determine sensor limitations and optimal operating conditions.  Similar studies should be done for vector field sensors once new instrumentation is available.  Vector sensors are of importance to the U.S. Army for the detection, localization, and tracking of potential threats in order to provide situational understanding and potentially life-saving technology to our soldiers. The particle velocity sensor we used was just bigger than a pencil. Including the windscreen, it was about a foot in diameter. Compare that to a microphone array that could be meters in size to accomplish the same thing.

Bibliography

  1. Cheinet, M. Cosnefroy, D.K. Wilson, V.E. Ostashev, S.L. Collier and J.E. Cain, “Effets de la turbulence sur des impulsions acoustiques propageant près du sol (Effects of turbulence on acoustic impulses propagating near the ground),” Congrès Français d’Acoustique (French Congress of Acoustics), 11-15 April 2016, Le Mans, France.
  2. Ehrhardt, S. Cheinet, D. Juvé and P. Blanc-Benon, “Evaluating a linearized Euler equations model for strong turbulence effects on sound propagation,” J. Acoust. Soc. Am., 133, 1922-1933 (2013).
  3. L. Collier, “Fisher Information for a Complex Gaussian Random Variable: Beamforming Applications for Wave Propagation in a Random Medium,” IEEE Trans. Sig. Proc. 53, 4236-4248 (2005).
  4. E. Norris, D.K. Wilson and D.W. Thomson, “Correlations Between Acoustic Travel-Time Fluctuations and Turbulence in the Atmospheric Surface Layer,” Acta Acust. Acust., 87, 677-684 (2001).

Acknowledgement:
This research was supported in part by an appointment to the U.S. Army Research Laboratory
Research Associateship Program administered by Oak Ridge Associated Universities.

1pPPB – Emotion Recognition from Speaker-dependent low-level acoustic features

Tejal Udhan – tu13b@my.fsu.edu
Shonda Bernadin – bernadin@eng.famu.fsu.edu
FAMU-FSU College of Engineering,
Department of Electrical and Computer Engineering
2525 Pottsdamer Street Tallahassee
Florida 32310

Popular version of paper 1pPPB: ‘Speaker-dependent low-level acoustic feature extraction for emotion recognition’
Presented Monday afternoon May 7, 2018
175th ASA Meeting, Minneapolis

EmotionSpeech is a most common and fastest means of communication between humans. This fact compelled researchers to study acoustic signals as a fast and efficient means of interaction between humans and machines. For authentic human-machine interaction, the method requires that the machines should have the sufficient intelligence to recognize human voices and their emotional state. Speech emotion recognition, extracting the emotional state of speakers from acoustic data, plays an important role in enabling machines to be ‘intelligent’. Audio and speech processing provides better, noninvasive and easy to acquire solutions than other biomedical signals such as electrocardiograms (ECG), and electroencephalograms (EEG).

Speech is an informative source for the perception of emotions. For example, talking in a loud voice when feeling very happy, speaking in an uncharacteristically high pitched voice when greeting a desirable person, or the presence of vocal tremor when something fearful or sad have been experienced. This cognitive recognition of emotions in turn indicates that listeners are able to infer the emotional state of the speaker reasonably accurately even in the absence of visual presence of information [1]. This theory of cognitive emotion inference forms the basis for speech emotion recognition. Acoustic emotion recognition finds so many applications in modern world ranging from interactive entertainment systems, medical therapies and monitoring to various human safety devices.

We conducted some preliminary experiments to classify four human emotions anger, happy, sad and neutral (no emotion) in male and female speakers. We chose two simple acoustic features, pitch and intensity, for this analysis. The choice of features is based on readily available tools for their calculation. Pitch is a relative highness or lowness of a tone as perceived by the ear and intensity is the energy contained in speech as it is produced. Since these are one- dimensional features, they can be easily analyzed for any acoustic emotion recognition system. We designed decision-tree based algorithm in MATLAB to perform emotion classification. LDC Emotional Prosody Dataset samples are used for this experiment [2]. One sample of each emotion for one male and one female speaker are given below.

{audio missing}

We observed that male speaker does not have many variations in the pitch for all the emotions. The pitch is consistently similar for any given emotion. The median intensity over each emotion class, though changing, remains consistently similar to training data values. As a result, emotion recognition in male speaker has accuracy of 88% for acoustic test signals. Though pitch is almost similar, there is clear distinction in intensities for emotions happy and sad. This dissimilarity in intensity resulted in higher accuracy of emotion recognitions in male speaker data. For female speaker, the pitch ranges anywhere from 230 Hz to 435 Hz for three different emotions, namely, happy, sad and anger. Hence, the median intensity becomes the sole criterion for emotion recognition. The intensities for emotions, happy and angry are almost similar since both the emotions are high arousal emotions. This resulted in low accuracy of emotion recognition in female speaker of about 63%. The overall accuracy of emotion recognition using this method is 75%.

Emotion

Fig. 1. Emotion Recognition Accuracy Comparison

Our algorithm successfully recognized emotions in male speaker. Since the pitch is consistent across each emotion in male speaker, the selected features, pitch and intensity, resulted in better accuracy of emotion recognition. For female acoustic data, selected features are insufficient to describe the emotions and hence in future research of this work, other features which are independent of voice quality such as prosodic, formant or spectral features will be evaluated.

[1]Fonagy, I. Emotions, voice and music. Sundberg J (Ed.) Research aspects on singing. Royal Swedish Academy of Music and Payot, Stockholm and Paris; pp 51–79, 1981.
[2]Liberman, Mark, et al. Emotional Prosody Speech and Transcripts LDC2002S28. Web Download. Philadelphia: Linguistic Data Consortium, 2002.

3aPA8 – High Altitude Venus Operational Concept (HAVOC)

Adam Trahan – ajt6261@louisiana.edu
Andi Petculescu – andi@louisiana.edu

University of Louisiana at Lafayette
Physics Department
240 Hebrard Blvd., Broussard Hall
Lafayette, LA 70503-2067

Popular version of paper 3aPA8
Presented Wednesday morning, May 9, 2018
175th ASA Meeting, Minneapolis, MN

HAVOC

Artist’s rendition of the envisioned HAVOC mission. (Credit: NASA Systems Analysis and Concepts Directorate, sacd.larc.nasa.gov/smab/havoc)

The motivation for this research stems from NASA’s proposed High Altitude Venus Operational Concept (HAVOC), which, if successful, would lead to a possible month-long human presence above the cloud layer of Venus.

The atmosphere of Venus is composed of primarily carbon dioxide with small amounts of Nitrogen and other trace molecules in the parts-per-million. With surface temperatures exceeding that of Earth’s by about 2.5 times and pressures roughly 100 times, the Venusian surface is quite a hostile environment. Higher into the atmosphere, however, the environment becomes relatively benign, with temperatures and pressures similar to those at Earth’s surface. In the 40-70 km region, condensational sulfuric acid clouds prevail, which contribute to the so-called “runaway greenhouse” effect.

The main condensable species on Venus is a binary mixture of sulfuric acid dissolved in water. The existence of aqueous sulfuric acid droplets is restricted to a thin region in Venus’ atmosphere, namely40-70 km from the surface. Nothing more than a light haze can exist in liquid form above and below this main cloud layer due to evaporation below and above. Inside the cloud layer, there exist three further sublayers; the upper cloud layer is produced using energy from the sun, while the lower and middle cloud layers are produced via condensation. The goal of this research is to determine how the lower and middle condensational cloud layers, affect the propagation of a sound waves, as they travel through the atmosphere.

It is true that for most waves to travel there must be a medium present, except for the case of electromagnetic waves (light), which are able to travel through the vacuum of space. But for sound waves, a fluid (gas or liquid) is necessary to support the wave. The presence of tiny particles affects the propagation of acoustic waves via energy loss processes; these effects have been well studied in Earth’s atmosphere. Using theoretical and numerical techniques, we are able to predict how much an acoustic wave would be weakened (attenuated) for every kilometer traveled in Venus’ clouds.

(attenuation_v_freq.jpg)

Figure 2. The frequency dependence of the wave attenuation coefficient. The attenuation is stronger at high frequencies, with a large transition region between 1 and 100 Hz.

Figure 2 shows how the attenuation parameter changes with frequency. At higher frequencies (greater than 100 Hz), the attenuation is larger than at lower frequencies, due primarily to the motion of the liquid cloud droplets as they react to the passing acoustic wave. In the lower frequency region, the attenuation is lower and is due primarily to evaporation and condensation processes, which require energy from the acoustic wave.

For the present study, the cloud environment was treated like a perfect (ideal) gas, which assumes the gas molecules behave like billiard balls, simply bouncing off one another. This assumption is valid for low-frequency sound waves. To complete the model, real-gas effects are added, to obtain the background attenuation in the surrounding atmosphere. This will enable us to predict the net amount of losses an acoustic wave is likely to experience at the projected HAVOC altitudes.

The results of this study could prove valuable for guiding the development of acoustic sensors designed to investigate atmospheric properties on Venus.

This research was sponsored by a grant from the Louisiana Space Consortium (LaSPACE).

2pNS8 – Noise Dependent Coherence-Super Gaussian based Dual Microphone Speech Enhancement for Hearing Aid Application using Smartphone

Nikhil Shankar– nxs162330@utdallas.edu
Gautam Shreedhar Bhat – gxs160730@utdallas.edu
Chandan K A Reddy – cxk131330@utdallas.edu
Dr. Issa M S Panahi – imp015000@utdallas.edu
Statistical Signal Processing Laboratory (SSPRL)
The University of Texas at Dallas
800W Campbell Road,
Richardson, TX – 75080, USA

Popular Version of Paper 2pNS8, “Noise dependent coherence-super Gaussian based dual microphone speech enhancement for hearing aid application using smartphone” will be presented Tuesday afternoon, May 8, 2018, 3:25 – 3:40 PM, NICOLLET D3
175th ASA Meeting, Minneapolis

Records by National Institute on Deafness and Other Communication Disorders (NIDCD) indicate that nearly 15% of adults (37million) aged 18 and over report some kind of hearing loss in the United States. Amongst the entire world population, 360 million people suffer from hearing loss.

Over the past decade, researchers have developed many feasible solutions for hearing impaired in the form of Hearing Aid Devices (HADs) and Cochlear Implants (CI). However, the performance of the HADs degrade in the presence of different types of background noise and lacks the computational power, due to the design constraints and to handle obligatory signal processing algorithms. Lately, HADs manufacturers are using a pen or a necklace as an external microphone to capture speech and transmit the signal and data by wire or wirelessly to HADs. The expense of these existing auxiliary devices poses as a limitation. An alternative solution is the use of smartphone which can capture the noisy speech data using the two microphones, perform complex computations using the Speech Enhancement algorithm and transmit the enhanced speech to the HADs.

In this work, the coherence between speech and noise signals [1] is used to obtain a Speech Enhancement (SE) gain function, in combination with a Super Gaussian Joint Maximum a Posteriori (SGJMAP) [2,3] single microphone SE gain function. The weighted union of these two gain functions strikes a balance between noise suppression and speech distortion. The theory behind the coherence method is that the speech from the two microphones is correlated, while the noise is uncorrelated with speech. The block diagram of the proposed method is as shown in Figure 1.

Speech Enhancement

Fig. 1. Block diagram of proposed SE method.

For the objective measure of quality of speech, we use Perceptual Evaluation of Speech Quality (PESQ). Coherence Speech Intelligibility Index (CSII) is used to measure the intelligibility of speech. PESQ ranges between 0.5 and 4, with 4 being high speech quality. CSII ranges between 0 and 1, with 1 being high intelligibility. Figure 2 shows the plots of PESQ and CSII versus SNR for two noise types, and performance comparison of proposed SE method with the conventional Coherence and LogMMSE SE methods.

Fig.2. Objective measures of speech quality and intelligibility

Along with Objective measures, we perform Mean Opinion Score (MOS) tests on 20 normal hearing both male and female subjects. Subjective test results are shown in Figure 3, which illustrates the effectiveness of the proposed method in various background noise.

Fig. 3. Subjective test results

Please refer our lab website https://www.utdallas.edu/ssprl/hearing-aid-project/ for video demos and the sample audio files are as attached below.

Audios: Audio files go here:

Noisy

Enhanced

Key References:
[1] N. Yousefian and P. Loizou, “A Dual-Microphone Speech Enhancement algorithm based on the Coherence Function,” IEEE Trans. Audio, Speech, and Lang. Processing, vol. 20, no.2, pp. 599-609, Feb 2012.
[2] Lotter, P. Vary, “Speech Enhancement by MAP Spectral Amplitude Estimation using a super-gaussian speech model,” EURASIP Journal on Applied Sig. Process, pp. 1110-1126, 2005.
[3] C. Karadagur Ananda Reddy, N. Shankar, G. Shreedhar Bhat, R. Charan and I. Panahi, “An Individualized Super-Gaussian Single Microphone Speech Enhancement for Hearing Aid Users With Smartphone as an Assistive Device,” in IEEE Signal Processing Letters, vol. 24, no. 11, pp. 1601-1605, Nov. 2017.

*This work was supported by the National Institute of the Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health (NIH) under the grant number 5R01DC015430-02. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors are with the Statistical Signal Processing Research Laboratory (SSPRL), Department of Electrical and Computer Engineering, The University of Texas at Dallas.

1pPA – Assessment of Learning Algorithms to Model Perception of Sound

Menachem Rafaelof
National Institute of Aerospace (NIA)

Andrew Schroeder
NASA Langley Research Center (NIFS intern, summer 2017)

175th Meeting of the
Acoustical Society of America
Minneapolis Minnesota
7-11 May 2018
1pPA, Novel Methods in Computational Acoustics II

Sound and its Perception
Sound waves are basically fluctuations of air pressure at points in a space. While this simple physical description of sound captures what sound is, its perception is much more complicated involving physiological and psychological processes.

Physiological processes involve a number of functions during transmission of sound through the outer, middle and inner ear before transduction into neural signals. Examples of these processes include amplification due to resonance within the outer ear, substantial attenuation at low frequencies within the inner ear and frequency component separation within the inner ear. Central processing of sound is based on neural impulses (counts of electrical signals) transferred to the auditory center of the brain. This transformation occurs at different levels in the brain. A major component in this processing is the auditory cortex, where sound is consciously perceived as being, for example, loud, soft, pleasing, or annoying.

Motivation
Currently an effort is underway to develop and put to use “air taxis”, vehicles for on-demand passenger transport. A major concern with these plans is operation of air vehicles close to the public and the potential negative impact of their noise. This concern motivates the need for the development of an approach to predict human perception of sound. Such capability will enable the designers to compare different vehicle configurations and their sounds, and address design factors that are important to noise perception.

Approach
Supervised learning algorithms are a class of machine learning algorithms capable of learning from examples. During the learning stage samples of input and matching response data are used to construct a predictive model. This work compared the performance of four supervised learning algorithms (Linear Regression (LR), Support Vector Machines (SVM), Decision Trees (DTs) and Random Forests (RFs)) to predict human annoyance from sounds. Construction of predictive models included three stages: 1) sample sounds for training are analyzed in term of loudness (N), roughness (R) , sharpness (S) , tone prominence ratio (PR) and fluctuation strength (FS). These parameters quantify various subjective attributes of sound and serve as predictors within the model. 2) Each training sound is presented to a group of test subjects and their annoyance response (Y in Figure 1) to each sound is gathered. 3) A predictive model (H-hat) is constructed using a machine learning algorithm and is used to predict the annoyance of new sample sounds (Y-hat).

Figure 1: Construction of a model (H-hat) to predict the annoyance of sound. Path a: training sounds are presented to subjects and their annoyance rating (Y) is gathered. Subject rating of training samples and matching predictors are used to construct the model, H-hat. Path b: annoyance of a new sound is estimated using H-hat.

Findings
In this work the performance of four models, or learning algorithms, was examined. Construction of these models relied on the annoyance response of 38 subjects to 103 sounds from 10 different sound sources grouped in four categories: road vehicles, unmanned aerial vehicles for package delivery, distributed electric propulsion aircraft and a simulated quadcopter. Comparison of these algorithms in terms of prediction accuracy (see Figure 2), model interpretability, versatility and computation time points to Random Forests as the best algorithms for the task. These results are encouraging considering the precision demonstrated using a low-dimension model (five predictors only) and the variety of sounds used.

Future Work
• Account for variance in human response data and establish a target error tolerace.
• Explore the use one or two additional predictors (i.e., impulsiveness and audibility)
• Develop an inexpensive, standard, process to gather human response data
• Collect additional human response data
• Establish an annoyance scale for air taxi vehicles

Figure 2: Prediction accuracy for the algorithms examined. Accuracy here is expressed as the fraction of points predicted within error tolerance (in terms of Mean Absolute Error (MAE)) vs. error tolerance or absolute deviation. For each case, Area Over the Curve (AOC) represents the total MAE.