How virtual reality technologies can enable better soundscape design.
W.M. To – firstname.lastname@example.org
Macao Polytechnic Institute, Macao SAR, China.
A. Chung – email@example.com
Smart City Maker, Denmark.
B. Schulte-Fortkamp – firstname.lastname@example.org
Technische Universität Berlin, Berlin, Germany.
Popular version of paper 2aNS, “How virtual reality technologies can enable better soundscape design”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu
The quality of life including good sound quality has been sought by community members as part of the smart city initiative. While many governments have placed special attention to waste management, air and water pollution, acoustic environment in cities has been directed toward the control of noise, in particular, transportation noise. Governments that care about the tranquility in cities rely primarily on setting the so-called acceptable noise levels i.e. just quantities for compliance and improvement . Sound quality is most often ignored. Recently, the International Organization for Standardization (ISO) released the standard on soundscape . However, sound quality is a subjective matter and depends heavily on the perception of humans in different contexts . For example, China’s public parks are well known to be rather noisy in the morning due to the activities of boisterous amateur musicians and dancers – many of them are retirees and housewives – or “Da Ma” . These activities would cause numerous complaints if they would happen in other parts of the world, but in China it is part of everyday life.
According to the ISO soundscape guideline, people can use sound walks, questionnaire surveys, and even lab tests to determine sound quality during a soundscape design process . With the advance of virtual reality technologies, we believe that the current technology enables us to create an application that immerses designers and stakeholders in the community to perceive and compare changes in sound quality and to provide feedback on different soundscape designs. An app has been developed specifically for this purpose. Figure 1 shows a simulated environment in which a student or visitor arrives the school’s campus, walks through the lawn, passes a multifunctional court, and get into an open area with table tennis tables. She or he can experience different ambient sounds and can click an object to increase or decrease the volume of sound from that object. After hearing sounds at different locations from different sources, the person can evaluate the level of acoustic comfort at each location and express their feelings toward overall soundscape. She or he can rate the sonic environment based on its degree of perceived loudness and its level of pleasantness using a 5-point scale from 1 = ‘heard nothing/not at all pleasant’ to 5 = ‘very loud/pleasant’. Besides, she or he shall describe the acoustic environment and soundscape using free words because of the multi-dimensional nature of sonic environment.
Figure 1. A simulated soundwalk in a school campus.
To, W. M., Mak, C. M., and Chung, W. L.. Are the noise levels acceptable in a built environment like Hong Kong? Noise and Health, 2015. 17(79): 429-439.
ISO. ISO 12913-1:2014 Acoustics – Soundscape – Part 1: Definition and Conceptual Framework, Geneva: International Organization for Standardization, 2014.
Kang, J. and Schulte-Fortkamp, B. (Eds.). Soundscape and the Built Environment, CRC Press, 2016.
School of Mechanical and Aerospace Eng., Seoul National University
301-1214, 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea
Popular version of paper 4aEA1, “Integrated simulation model for prediction of acoustic environment of launch vehicle”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu
Literally speaking, a “sound” refers to a pressure fluctuation of the air. This means, for example, the sound of a bus passing means our ear senses the pressure fluctuation or pressure variation the bus created. During our daily lives, there are rarely significant pressure fluctuations in the air above common noises, but in special cases it happens. Windows are commonly featured in movies breaking from someone screaming loudly or in high pitches in the movie. This is usually exaggerated, but not out of the realm of what is physically possible.
The pressure fluctuations in the air caused by sound can cause engineering problems for loud structures such as rockets, especially given that the pressure nature of the sounds waves that means louder sounds result from larger pressure fluctuations and can cause more damage. Rocket launches are particularly loud and the resulting pressure change in the air can affect the surface of the launched vehicle as the form of the force shown as Figure 1.
Figure 1. The Magnitude of Acoustic Loads on the Launch Vehicle
As the vehicle is launched (Figure. 2),it reaches volumes over 180dB, which corresponds to about 20,000 Pascals in pressure change. This pressure change is about 20% of atmospheric pressure, which is considered very large. Because of the pressure change during launching, communication equipment and antenna panel can incur damage, causing the malfunctioning of the fairing, the protective cone covering the satellite. In the engineering field, the load created by the launching noise is called acoustic load, and many studies are in progress related to acoustic load.
Studies focused on the relationship between a launching vehicle and its acoustic load is categorized, to rocket engineers, under “prediction and control.” Prediction is divided into two aspects: internal acoustic load; and external acoustic load. Internal acoustic load refers to sound delivered from outside to inside, while external acoustic load is the noise directly from the jet fire. There are two ways to predict the external acoustic load, namely an empirical method and numerical method. The empirical method was developed by NASA in 1972 and uses the collected information from various studies. The numerical method employs mathematical formulas related to noise and electric wave calculated using computer modeling. As computers become more powerful, this method continues to gain favor. However, because numerical methods require so much calculation time, they often require the use of dedicated computing centers. Our team instead focused on using the more efficient and faster empirical method.
Figure 3 shows the results of our calculations, depicting the expected sound spectrum. We can consider various physics principles involved during a lift-off, such as sound reflection, diffraction and impingement that could affect the original empirical method results.
Meanwhile, our team used a statistical energy analysis method to predict the internal acoustic load caused by the predicted external acoustic load. This method is used often to predict internal noise environments. It is used to predict the internal noise of a launching vehicle as well as aircraft and automobile noise. Our research team used a program called, VA One SEA, for predicting these noise effects, shown as figure. 4.
Figure 4. Modeling of the Payloads and Forcing of the External Acoustic Loads
After predicting internal acoustic load, we decreased the acoustic load to conduct an internal noise control study. A common way to do this is by sticking noise-reducing material to the structure. However, the extra weight from the noise-reducing material can cause decreased performance. To overcome this side effect, we also conducted a study about active noise control, which is in progress. Active noise control refers to reducing the noise by making antiphase waves of the sound for cancelling. Figure 5 shows the experimental results of applied SISO Noise Control, showing the reduction of noise is significant, especially for low frequencies.
Figure 5. Experimental Results of SISO Active Noise Control
Our research team applied the acoustic load prediction method and control method to the Korean launching vehicle, KSR-111. Through this application, we developed an improved empirical prediction method that is more accurate than previous methods, and we found usefulness of the noise control as we established the best algorithm for our experimental facilities and the active noise control area.
Noise, vibration, and harshness (NVH) of smartphones
Inman Jang – email@example.com
Tae-Young Park – firstname.lastname@example.org
Won-Suk Ohm – email@example.com
50, Yonsei-ro, Seodaemun-gu
Heungkil Park – firstname.lastname@example.org
Samsung Electro Mechanics Co., Ltd.
150, Maeyeong-ro, Yeongtong-gu
Suwon-si, Gyeonggi-do 16674
Popular version of paper 1aNS5, “Controlling smartphone vibration and noise”
Presented Monday morning, November 28, 2016
172nd ASA Meeting, Honolulu
Noise, vibration, and harshness, also known as NVH, refers to the comprehensive engineering of noise and vibration of a device through stages of their production, transmission, and human perception. NVH is a primary concern in car and home appliance industries because many consumers take into account the quality of noise when making buying decisions. For example, a car that sounds too quiet (unsafe) or too loud (uncomfortable) is a definite turnoff. That said, a smartphone may strike you as an acoustically innocuous device (unless you are not a big fan of Metallica ringtones), for which the application of NVH seems unwarranted. After all, who would expect the roar of a Harley from a smartphone? But think again. Albeit small in amplitude (less than 30 dB), smartphones emit an audible buzz that, because of the close proximity to the ear, can degrade the call quality and cause annoyance.
Figure 1: Smartphone noise caused by MLCCs
The major culprit for the smartphone noise is the collective vibration of tiny electronics components, known as multi-layered ceramic capacitors (MLCCs). An MLCC is basically a condenser made of piezoelectric ceramics, which expands and contracts upon the application of voltage (hence piezoelectric). A typical smartphone has a few hundred MLCCs soldered to the circuit board inside. The almost simultaneous pulsations of these MLCCs are transmitted to and amplified by the circuit board, the vibration of which eventually produces the distinct buzzing noise as shown in Fig. 1. (Imagine a couple hundred rambunctious little kids jumping up and down on a floor almost in unison!) The problem has been even more exacerbated by the recent trend in which the name of the game is “The slimmer the better”; because a slimmer circuit board is much easier to flex it transmits and produces more vibration and noise.
Recently, Yonsei University and Samsung Electromechanics in South Korea joined forces to address this problem. Their comprehensive NVH regime includes the visualization of smartphone noise and vibration (transmission), the identification and replacement of the most problematic MLCCs (production), and the evaluation of harshness of the smartphone noise (human perception). For visualization of smartphone noise, a technique known as the nearfield acoustic holography is used to produce a sound map as shown in Fig. 2, in which the spatial distribution of sound pressure, acoustic intensity or surface velocity can be overlapped on the snapshot of the smartphone. Such sound maps help smartphone designers draw a detailed mental picture of what is going on acoustically and proceed to rectify the problem by identifying the groups of MLCCs most responsible for producing the vibration of the circuit board. Then, engineers can take corrective actions by replacing the (cheap) problematic MLCCs with (expensive) low-vibration MLCCs. Lastly, the outcome of the noise/vibration engineering is measured not only in terms of physical attributes such as sound pressure level, but also in their psychological correlates such as loudness and the overall psychoacoustic annoyance. This three-pronged strategy (addressing production, transmission, and human perception) is proven to be highly effective, and currently Samsung Electromechanics is offering the NVH service to a number of major smartphone vendors around the world.
Indris’ melodies are individually distinctive and genetically driven
Marco Gamba – email@example.com
Cristina Giacoma – firstname.lastname@example.org
University of Torino
Department of Life Sciences and Systems Biology
Via Accademia Albertina 13
10123 Torino, Italy
Popular version of paper 2aABa3 “Melody in my head, melody in my genes? Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu
Melody in my head, melody in my genes? Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar
Human hearing ablities are exceptional at identifying the voices of friends and relatives . The potential for this identification lies in the acoustic structures of our words, which not only convey verbal information (the meaning of our words) but also non-verbal cues (such as sex and identity of the speakers).
In animal communication, the recognizing a member of the same species can also be important. Birds and mammals may adjust their signals that function for neighbor recognition, and the discrimination between a known neighbor and a stranger would result in strikingly different responses in term of territorial defense .
Indris (Indri indri) are the only lemurs that produce group songs and among the few primate species that communicate using articulated singing displays. The most distinctive portions of the indris’ song are called descending phrases, consisting of between two and five units or notes. We recorded 21 groups of indris in the Eastern rainforests of Madagascar from 2005 to 2015. In each recording, we identified individuals using natural markings. We noticed that group encounters were rare, and hypothesized that song might play a role in providing members of the same species with information about the sex and identity of an individual singer and the emitting group.
We found we could effectively discriminate between the descending phrases of an individual indris, showing they have the potential for advertising about sex and individual identity. This strengthened the hypothesis that song may play a role in processes like kinship and mate recognition. Finding that there is was degree of group specificity in the song also supports the idea that neighbor-stranger recognition is also important in the indris and that the song may function announcing territorial occupation and spacing.
Traditionally, primate songs are considered an example of a genetically determined display. Thus the following step in our research was to examine whether the structure of the phrases could relate to the genetic relatedness of the indris. We found a significant correlation between the genetic relatedness of the studied individuals and the acoustic similarity of their song phrases. This suggested that genetic relatedness may play a role in determining song similarity.
For the first time, we found evidence that the similarity of a primate vocal display changes within a population in a way that is strongly associated with kin. When examining differences between sexes we found that male offspring showed phrases that were more similar to their fathers, while daughters did not show similarity with any of their parents.
The potential for kin detection may play a vital role in determining relationships within a population, regulating dispersal, and avoiding inbreeding. Singing displays may advertise kin to signal against potential mating, information that females, and to a lesser degree males, can use when forming a new group. Unfortunately, we still do not know whether indris can perceptually decode this information or how they use it in their everyday life. But work like this sets the basis for understanding primates’ mating and social systems and lays the foundation for better conservation methods.
Belin, P. Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 2006. 361: p. 2091-2107.
Randall, J. A. Discrimination of foot drumming signatures by kangaroo rats, Dipodomys spectabilis. Animal Behaviour, 1994. 47: p. 45-54.
Gamba, M., Torti, V., Estienne, V., Randrianarison, R. M., Valente, D., Rovara, P., Giacoma, C. The Indris Have Got Rhythm! Timing and Pitch Variation of a Primate Song Examined between Sexes and Age Classes. Frontiers in Neuroscience, 2016. 10: p. 249.
Torti, V., Gamba, M., Rabemananjara, Z. H., Giacoma, C. The songs of the indris (Mammalia: Primates: Indridae): contextual variation in the long-distance calls of a lemur. Italian Journal of Zoology, 2013. 80, 4.
Barelli, C., Mundry, R., Heistermann, M., Hammerschmidt, K. Cues to androgen and quality in male gibbon songs. PLoS ONE, 2013. 8: e82748.
Figure 1. A female indri with offspring in the Maromizaha Forest, Madagascar. Maromizaha is a New Protected Area located in the Region Alaotra-Mangoro, east of Madagascar. It is managed by GERP (Primate Studies and Research Group). At least 13 species of lemurs have been observed in the area.
Figure 2. Spectrograms of an indri song showing a typical sequence of different units. In the enlarged area, the pitch contour in red shows a typical “descending phrase” of 4 units. The indris also emit phrases of 2, 3 and more rarely 5 or 6 units.
Figure 3. A 3d-plot of the dimensions (DF1, DF2, DF3) generated from a Discriminant model that successfully assigned descending phrases of four units (DP4) to the emitter. Colours denote individuals. The descending phrases of two (DP2) and three units (DP3) also showed a percentage of correct classification rate significantly above chance.
What the f***? Making sense of expletives in The Wire
Erica Gold – email@example.com
Dan McIntyre – firstname.lastname@example.org
University of Huddersfield
Huddersfield, HD1 3DH
Popular version of paper 3pSC87, “ What the f***? Making sense of expletives in ‘The Wire'”
Presented Wednesday afternoon, November 30, 2016
172nd ASA Meeting, Honolulu
In Season one of HBO’s acclaimed crime drama The Wire, Detectives Jimmy McNulty and ‘Bunk’ Moreland are investigating old homicide cases, including the murder of a young woman shot dead in her apartment. McNulty and Bunk visit the scene of the crime to try and figure out exactly how the woman was killed. What makes the scene unusual dramatically is that, engrossed in their investigation, the two detectives communicate with each other using only the word, “fuck” and its variants (e.g. motherfucker, fuckity fuck, etc.). Somehow, using only this vocabulary, McNulty and Bunk are able to communicate in a meaningful way. The scene is absorbing, engaging and even funny, and it leads to a fascinating question for linguists: how is the viewer able to understand what McNulty and Bunk mean when they communicate using such a restricted set of words?
To investigate this, we first looked at what other linguists have discovered about the word fuck. What is clear is that it’s a hugely versatile word that can be used to express a range of attitudes and emotions. On the basis of this research, we came up with a classification scheme which we then used to categorise all the variants of fuck in the scene. Some seemed to convey disbelief and some were used as insults. Some indicated surprise or realization while others functioned to intensify the following word. And some were idiomatic set phrases (e.g. Fuckin’ A!). Our next step was to see whether there was anything in the acoustic properties of the characters’ speech that would allow us to explain why we interpreted the fucks in the way that we did.
The entire conversation between Bunk and McNulty lasts around three minutes and contains a total of 37 fuck productions (i.e. variations of fuck). Due to the variation in the fucks produced, the one clear and consistent segment for each word was the <u> in fuck. Consequently, this became the focus of our study. The <u> in fuck is the same sound you find in the word strut or duck and is represented as /ᴧ/ in the International Phonetic Alphabet. When analysing vowel sounds, such as <u>, we can look at a number of aspects of its production.
In this study, we looked at the quality of the vowel by measuring the first three formants. In phonetics, the term formant refers to acoustic resonances of sound in the vocal tract. The first two formants can tell us if the production sounds more like, “fuck” rather than, “feck” or “fack,” and the third formant gives us information about the voice quality. We also looked at the duration of the <u> being produced, “fuuuuuck” versus “ fuck.”
After measuring each instance, we ran statistical tests to see if there was any relationship between the way in which it was said, and how we categorised its range of meanings. Our results showed that if we accounted for the differences in the vocal tract shapes of the actors playing Bunk and McNulty, the quality of the vowels are relatively consistent. That is, we get a lot of <u> sounds, rather than “eh,”“oo” or “ih.”
The productions of fucks that were associated with the category of realization were found to be very similar to those associated with disbelief. However, disbelief and realization did contrast with those that were used as insults, idiomatic phrases, or functional words. Therefore, it may be more appropriate to classify the meaning into fewer categories – those that signify disbelief or realization, and those that are idiomatic, insults, or functional. It is important to remember, however, that the latter group of three meanings are represented by fewer examples in the scene. Our initial results show that these two broad groups may be distinguished through the length of the vowel – short <u> is more associated with an insult, function, or idiomatic use rather than disbelief or surprise (for which the vowel tends to be longer). In the future, we would also like to analyse the intonation of the productions. See if you can hear the difference between these samples:
Example 1: realization/surprise
Example 2: general expletive which falls under the functional/idiomatic/insult category
Our results shed new light on what for linguists is an old problem: how do we make sense of what people say when speakers so very rarely say exactly what they mean? Experts in pragmatics (the study of how meaning is affected by context) have suggested that we infer meaning when people break conversational norms. In the example from The Wire, it’s clear that the characters are breaking normal communicative conventions. But pragmatic methods of analysis don’t get us very far in explaining how we are able to infer such a range of meaning from such limited vocabulary. Our results confirm that the answer to this question is that meaning is not just conveyed at the lexical and pragmatic level, but at the phonetic level too. It’s not just what we say that’s important, it’s how we fucking say it!
Popular version of paper 1aSC31, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals.”
Presented Monday morning, Novemeber 28, 2016
172nd ASA Meeting, Honolulu
Have you ever had difficulty understanding what someone was saying to you while walking down a busy big city street, or in a crowded restaurant? Even if that person was right next to you? Words can become difficult to make out when they get jumbled with the ambient noise – cars honking, other voices – making it hard for our ears to pick up what we want to hear. But this is not so for bats. Their ears can move and change shape to precisely pick out specific sounds in their environment.
This biosonar capability inspired our artificial ear research and improving the accuracy of automatic speech recognition (ASR) systems and speaker localization. We asked if could we enrich a speech signal with direction-dependent, dynamic features by using bat-inspired reception dynamics?
Horseshoe bats, for example, are found throughout Africa, Europe and Asia, and so-named for the shape of their noses, can change the shape of their outer ears to help extract additional information about the environment from incoming ultrasonic echoes. Their sophisticated biosonar systems emit ultrasonic pulses and listen to the incoming echoes that reflect back after hitting surrounding objects by changing their ear shape (something other mammals cannot do). This allows them to learn about the environment, helping them navigate and hunt in their home of dense forests.
While probing the environment, horseshoe bats change their ear shape to modulate the incoming echoes, increasing the information content embedded in the echoes. We believe that this shape change is one of the reasons bats’ sonar exhibit such high performance compared to technical sonar systems of similar size.
To test this, we first built a robotic bat head that mimics the ear shape changes we observed in horseshoe bats.
Figure 1: Horseshoe bat inspired robotic set-up used to record speech signal
We then recorded speech signals to explore if using shape change, inspired by the bats, could embed direction-dependent dynamic features into speech signals. The potential applications of this could range from improving hearing aid accuracy to helping a machine more-accurately hear – and learn from – sounds in real-world environments.
We compiled a digital dataset of 11 US English speakers from open source speech collections provided by Carnegie Mellon University. The human acoustic utterances were shifted to the ultrasonic domain so our robot could understand and play back the sounds into microphones, while the biomimetic bat head actively moved its ears. The signals at the base of the ears were then translated back to the speech domain to extract the original signal.
This pilot study, performed at IBM Research in collaboration with Virginia Tech, showed that the ear shape change was, in fact, able to significantly modulate the signal and concluded that these changes, like in horseshoe bats, embed dynamic patterns into speech signals.
The dynamically enriched data we explored improved the accuracy of speech recognition. Compared to a traditional system for hearing and recognizing speech in noisy environments, adding structural movement to a complex outer shape surrounding a microphone, mimicking an ear, significantly improved its performance and access to directional information. In the future, this might improve performance in devices operating in difficult hearing scenarios like a busy street in a metropolitan center.
Figure 2: Example of speech signal recorded without and with the dynamic ear. Top row: speech signal without the dynamic ear, Bottom row: speech signal with the dynamic ear