Menachem Rafaelof
National Institute of Aerospace (NIA)

Andrew Schroeder
NASA Langley Research Center (NIFS intern, summer 2017)

175th Meeting of the
Acoustical Society of America
Minneapolis Minnesota
7-11 May 2018
1pPA, Novel Methods in Computational Acoustics II

Sound and its Perception
Sound waves are basically fluctuations of air pressure at points in a space. While this simple physical description of sound captures what sound is, its perception is much more complicated involving physiological and psychological processes.

Physiological processes involve a number of functions during transmission of sound through the outer, middle and inner ear before transduction into neural signals. Examples of these processes include amplification due to resonance within the outer ear, substantial attenuation at low frequencies within the inner ear and frequency component separation within the inner ear. Central processing of sound is based on neural impulses (counts of electrical signals) transferred to the auditory center of the brain. This transformation occurs at different levels in the brain. A major component in this processing is the auditory cortex, where sound is consciously perceived as being, for example, loud, soft, pleasing, or annoying.

Currently an effort is underway to develop and put to use “air taxis”, vehicles for on-demand passenger transport. A major concern with these plans is operation of air vehicles close to the public and the potential negative impact of their noise. This concern motivates the need for the development of an approach to predict human perception of sound. Such capability will enable the designers to compare different vehicle configurations and their sounds, and address design factors that are important to noise perception.

Supervised learning algorithms are a class of machine learning algorithms capable of learning from examples. During the learning stage samples of input and matching response data are used to construct a predictive model. This work compared the performance of four supervised learning algorithms (Linear Regression (LR), Support Vector Machines (SVM), Decision Trees (DTs) and Random Forests (RFs)) to predict human annoyance from sounds. Construction of predictive models included three stages: 1) sample sounds for training are analyzed in term of loudness (N), roughness (R) , sharpness (S) , tone prominence ratio (PR) and fluctuation strength (FS). These parameters quantify various subjective attributes of sound and serve as predictors within the model. 2) Each training sound is presented to a group of test subjects and their annoyance response (Y in Figure 1) to each sound is gathered. 3) A predictive model (H-hat) is constructed using a machine learning algorithm and is used to predict the annoyance of new sample sounds (Y-hat).

Figure 1: Construction of a model (H-hat) to predict the annoyance of sound. Path a: training sounds are presented to subjects and their annoyance rating (Y) is gathered. Subject rating of training samples and matching predictors are used to construct the model, H-hat. Path b: annoyance of a new sound is estimated using H-hat.

In this work the performance of four models, or learning algorithms, was examined. Construction of these models relied on the annoyance response of 38 subjects to 103 sounds from 10 different sound sources grouped in four categories: road vehicles, unmanned aerial vehicles for package delivery, distributed electric propulsion aircraft and a simulated quadcopter. Comparison of these algorithms in terms of prediction accuracy (see Figure 2), model interpretability, versatility and computation time points to Random Forests as the best algorithms for the task. These results are encouraging considering the precision demonstrated using a low-dimension model (five predictors only) and the variety of sounds used.

Future Work
• Account for variance in human response data and establish a target error tolerace.
• Explore the use one or two additional predictors (i.e., impulsiveness and audibility)
• Develop an inexpensive, standard, process to gather human response data
• Collect additional human response data
• Establish an annoyance scale for air taxi vehicles

Figure 2: Prediction accuracy for the algorithms examined. Accuracy here is expressed as the fraction of points predicted within error tolerance (in terms of Mean Absolute Error (MAE)) vs. error tolerance or absolute deviation. For each case, Area Over the Curve (AOC) represents the total MAE.

Share This