1aNS5 – Noise, vibration, and harshness (NVH) of smartphones

Inman Jang – kgpbjim@yonsei.ac.kr
Tae-Young Park – pty0948@yonsei.ac.kr
Won-Suk Ohm – ohm@yonsei.ac.kr
Yonsei University
50, Yonsei-ro, Seodaemun-gu
Seoul 03722
Korea

Heungkil Park – heungkil.park@samsung.com
Samsung Electro Mechanics Co., Ltd.
150, Maeyeong-ro, Yeongtong-gu
Suwon-si, Gyeonggi-do 16674
Korea

Popular version of paper 1aNS5, “Controlling smartphone vibration and noise”
Presented Monday morning, November 28, 2016
172nd ASA Meeting, Honolulu

Noise, vibration, and harshness, also known as NVH, refers to the comprehensive engineering of noise and vibration of a device through stages of their production, transmission, and human perception. NVH is a primary concern in car and home appliance industries because many consumers take into account the quality of noise when making buying decisions. For example, a car that sounds too quiet (unsafe) or too loud (uncomfortable) is a definite turnoff. That said, a smartphone may strike you as an acoustically innocuous device (unless you are not a big fan of Metallica ringtones), for which the application of NVH seems unwarranted. After all, who would expect the roar of a Harley from a smartphone? But think again. Albeit small in amplitude (less than 30 dB), smartphones emit an audible buzz that, because of the close proximity to the ear, can degrade the call quality and cause annoyance.

smartphone-noise

Figure 1: Smartphone noise caused by MLCCs

The major culprit for the smartphone noise is the collective vibration of tiny electronics components, known as multi-layered ceramic capacitors (MLCCs). An MLCC is basically a condenser made of piezoelectric ceramics, which expands and contracts upon the application of voltage (hence piezoelectric). A typical smartphone has a few hundred MLCCs soldered to the circuit board inside. The almost simultaneous pulsations of these MLCCs are transmitted to and amplified by the circuit board, the vibration of which eventually produces the distinct buzzing noise as shown in Fig. 1. (Imagine a couple hundred rambunctious little kids jumping up and down on a floor almost in unison!) The problem has been even more exacerbated by the recent trend in which the name of the game is “The slimmer the better”; because a slimmer circuit board is much easier to flex it transmits and produces more vibration and noise.

Recently, Yonsei University and Samsung Electromechanics in South Korea joined forces to address this problem. Their comprehensive NVH regime includes the visualization of smartphone noise and vibration (transmission), the identification and replacement of the most problematic MLCCs (production), and the evaluation of harshness of the smartphone noise (human perception). For visualization of smartphone noise, a technique known as the nearfield acoustic holography is used to produce a sound map as shown in Fig. 2, in which the spatial distribution of sound pressure, acoustic intensity or surface velocity can be overlapped on the snapshot of the smartphone. Such sound maps help smartphone designers draw a detailed mental picture of what is going on acoustically and proceed to rectify the problem by identifying the groups of MLCCs most responsible for producing the vibration of the circuit board. Then, engineers can take corrective actions by replacing the (cheap) problematic MLCCs with (expensive) low-vibration MLCCs. Lastly, the outcome of the noise/vibration engineering is measured not only in terms of physical attributes such as sound pressure level, but also in their psychological correlates such as loudness and the overall psychoacoustic annoyance. This three-pronged strategy (addressing production, transmission, and human perception) is proven to be highly effective, and currently Samsung Electromechanics is offering the NVH service to a number of major smartphone vendors around the world.

sound-map - smartphone

Figure 2: Sound map of a smartphone surface

 

2aABa3 – Indris’ melodies are individually distinctive and genetically driven

Marco Gamba – marco.gamba@unito.it
Cristina Giacoma – cristina.giacoma@unito.it

University of Torino
Department of Life Sciences and Systems Biology
Via Accademia Albertina 13
10123 Torino, Italy

Popular version of paper 2aABa3 “Melody in my head, melody in my genes? Acoustic similarity, individuality and genetic relatedness in the indris of Eastern Madagascar”
Presented Tuesday morning, November 29, 2016
172nd ASA Meeting, Honolulu

Human hearing ablities are exceptional at identifying the voices of friends and relatives [1]. The potential for this identification lies in the acoustic structures of our words, which not only convey verbal information (the meaning of our words) but also non-verbal cues (such as sex and identity of the speakers).

In animal communication, the recognizing a member of the same species can also be important. Birds and mammals may adjust their signals that function for neighbor recognition, and the discrimination between a known neighbor and a stranger would result in strikingly different responses in term of territorial defense [2].

Indris (Indri indri) are the only lemurs that produce group songs and among the few primate species that communicate using articulated singing displays. The most distinctive portions of the indris’ song are called descending phrases, consisting of between two and five units or notes. We recorded 21 groups of indris in the Eastern rainforests of Madagascar from 2005 to 2015. In each recording, we identified individuals using natural markings. We noticed that group encounters were rare, and hypothesized that song might play a role in providing members of the same species with information about the sex and identity of an individual singer and the emitting group.

gamba1 - indris

Figure 1. A female indri with offspring in the Maromizaha Forest, Madagascar. Maromizaha is a New Protected Area located in the Region Alaotra-Mangoro, east of Madagascar. It is managed by GERP (Primate Studies and Research Group). At least 13 species of lemurs have been observed in the area.

We found we could effectively discriminate between the descending phrases of an individual indris, showing they have the potential for advertising about sex and individual identity. This strengthened the hypothesis that song may play a role in processes like kinship and mate recognition. Finding that there is was degree of group specificity in the song also supports the idea that neighbor-stranger recognition is also important in the indris and that the song may function announcing territorial occupation and spacing.

gamba3

Figure 2. Spectrograms of an indri song showing a typical sequence of different units. In the enlarged area, the pitch contour in red shows a typical “descending phrase” of 4 units. The indris also emit phrases of 2, 3 and more rarely 5 or 6 units.

Traditionally, primate songs are considered an example of a genetically determined display. Thus the following step in our research was to examine whether the structure of the phrases could relate to the genetic relatedness of the indris. We found a significant correlation between the genetic relatedness of the studied individuals and the acoustic similarity of their song phrases. This suggested that genetic relatedness may play a role in determining song similarity.

For the first time, we found evidence that the similarity of a primate vocal display changes within a population in a way that is strongly associated with kin. When examining differences between sexes we found that male offspring showed phrases that were more similar to their fathers, while daughters did not show similarity with any of their parents.

gamba2

Figure 3. A 3d-plot of the dimensions (DF1, DF2, DF3) generated from a Discriminant model that successfully assigned descending phrases of four units (DP4) to the emitter. Colours denote individuals. The descending phrases of two (DP2) and three units (DP3) also showed a percentage of correct classification rate significantly above chance.

The potential for kin detection may play a vital role in determining relationships within a population, regulating dispersal, and avoiding inbreeding. Singing displays may advertise kin to signal against potential mating, information that females, and to a lesser degree males, can use when forming a new group. Unfortunately, we still do not know whether indris can perceptually decode this information or how they use it in their everyday life. But work like this sets the basis for understanding primates’ mating and social systems and lays the foundation for better conservation methods.

  1. Belin, P. Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 2006. 361: p. 2091-2107.
  2. Randall, J. A. Discrimination of foot drumming signatures by kangaroo rats, Dipodomys spectabilis. Animal Behaviour, 1994. 47: p. 45-54.
  3. Gamba, M., Torti, V., Estienne, V., Randrianarison, R. M., Valente, D., Rovara, P., Giacoma, C. The Indris Have Got Rhythm! Timing and Pitch Variation of a Primate Song Examined between Sexes and Age Classes. Frontiers in Neuroscience, 2016. 10: p. 249.
  4. Torti, V., Gamba, M., Rabemananjara, Z. H., Giacoma, C. The songs of the indris (Mammalia: Primates: Indridae): contextual variation in the long-distance calls of a lemur. Italian Journal of Zoology, 2013. 80, 4.
  5. Barelli, C., Mundry, R., Heistermann, M., Hammerschmidt, K. Cues to androgen and quality in male gibbon songs. PLoS ONE, 2013. 8: e82748.

Tags:
-Animals
-Melody
-Biology

3pSC87 – What the f***? Making sense of expletives in The Wire

Erica Gold – e.gold@hud.ac.uk
Dan McIntyre – d.mcintyre@hud.ac.uk

University of Huddersfield
Queensgate
Huddersfield, HD1 3DH
United Kingdom

Popular version of 3pSC87 – What the f***: An acoustic-pragmatic analysis of meaning in The Wire
Presented Wednesday afternoon, November 30, 2016
172nd ASA Meeting, Honolulu
Click here to read the abstract

The Wire - expletives

In Season one of HBO’s acclaimed crime drama The Wire, Detectives Jimmy McNulty and ‘Bunk’ Moreland are investigating old homicide cases, including the murder of a young woman shot dead in her apartment. McNulty and Bunk visit the scene of the crime to try and figure out exactly how the woman was killed. What makes the scene unusual dramatically is that, engrossed in their investigation, the two detectives communicate with each other using only the word, “fuck” and its variants (e.g. motherfucker, fuckity fuck, etc.). Somehow, using only this vocabulary, McNulty and Bunk are able to communicate in a meaningful way. The scene is absorbing, engaging and even funny, and it leads to a fascinating question for linguists: how is the viewer able to understand what McNulty and Bunk mean when they communicate using such a restricted set of words?

bunk mcnulty

To investigate this, we first looked at what other linguists have discovered about the word fuck. What is clear is that it’s a hugely versatile word that can be used to express a range of attitudes and emotions. On the basis of this research, we came up with a classification scheme which we then used to categorise all the variants of fuck in the scene. Some seemed to convey disbelief and some were used as insults. Some indicated surprise or realization while others functioned to intensify the following word. And some were idiomatic set phrases (e.g. Fuckin’ A!). Our next step was to see whether there was anything in the acoustic properties of the characters’ speech that would allow us to explain why we interpreted the fucks in the way that we did.

The entire conversation between Bunk and McNulty lasts around three minutes and contains a total of 37 fuck productions (i.e. variations of fuck). Due to the variation in the fucks produced, the one clear and consistent segment for each word was the <u> in fuck. Consequently, this became the focus of our study. The <u> in fuck is the same sound you find in the word strut or duck and is represented as /ᴧ/ in the International Phonetic Alphabet. When analysing vowel sounds, such as <u>, we can look at a number of aspects of its production.

In this study, we looked at the quality of the vowel by measuring the first three formants. In phonetics, the term formant refers to acoustic resonances of sound in the vocal tract. The first two formants can tell us if the production sounds more like, “fuck” rather than, “feck” or “fack,” and the third formant gives us information about the voice quality. We also looked at the duration of the <u> being produced, “fuuuuuck” versus “ fuck.”

After measuring each instance, we ran statistical tests to see if there was any relationship between the way in which it was said, and how we categorised its range of meanings. Our results showed that if we accounted for the differences in the vocal tract shapes of the actors playing Bunk and McNulty, the quality of the vowels are relatively consistent. That is, we get a lot of <u> sounds, rather than “eh,” “oo” or “ih.”

The productions of fucks that were associated with the category of realization were found to be very similar to those associated with disbelief. However, disbelief and realization did contrast with those that were used as insults, idiomatic phrases, or functional words. Therefore, it may be more appropriate to classify the meaning into fewer categories – those that signify disbelief or realization, and those that are idiomatic, insults, or functional. It is important to remember, however, that the latter group of three meanings are represented by fewer examples in the scene. Our initial results show that these two broad groups may be distinguished through the length of the vowel – short <u> is more associated with an insult, function, or idiomatic use rather than disbelief or surprise (for which the vowel tends to be longer). In the future, we would also like to analyse the intonation of the productions. See if you can hear the difference between these samples:

Example 1: realization/surprise

Example 2: general expletive which falls under the functional/idiomatic/insult category

Our results shed new light on what for linguists is an old problem: how do we make sense of what people say when speakers so very rarely say exactly what they mean? Experts in pragmatics (the study of how meaning is affected by context) have suggested that we infer meaning when people break conversational norms. In the example from The Wire, it’s clear that the characters are breaking normal communicative conventions. But pragmatic methods of analysis don’t get us very far in explaining how we are able to infer such a range of meaning from such limited vocabulary. Our results confirm that the answer to this question is that meaning is not just conveyed at the lexical and pragmatic level, but at the phonetic level too. It’s not just what we say that’s important, it’s how we fucking say it!

*all photos are from HBO.com

1aSC31 – Shape changing artificial ear inspired by bats enriches speech signals

Anupam K Gupta1,2, Jin-Ping Han ,2, Philip Caspers1, Xiaodong Cui2, Rolf Müller1

  1. Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
  2. IBM T. J. Watson Research Center, Yorktown, NY, USA

Contact: Jin-Ping Han – hanjp@us.ibm.com

Popular version of paper 1aSC31, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals.”
Presented Monday morning, Novemeber 28, 2016
172nd ASA Meeting, Honolulu

Have you ever had difficulty understanding what someone was saying to you while walking down a busy big city street, or in a crowded restaurant? Even if that person was right next to you? Words can become difficult to make out when they get jumbled with the ambient noise – cars honking, other voices – making it hard for our ears to pick up what we want to hear. But this is not so for bats. Their ears can move and change shape to precisely pick out specific sounds in their environment.

This biosonar capability inspired our artificial ear research and improving the accuracy of automatic speech recognition (ASR) systems and speaker localization. We asked if could we enrich a speech signal with direction-dependent, dynamic features by using bat-inspired reception dynamics?

Horseshoe bats, for example, are found throughout Africa, Europe and Asia, and so-named for the shape of their noses, can change the shape of their outer ears to help extract additional information about the environment from incoming ultrasonic echoes. Their sophisticated biosonar systems emit ultrasonic pulses and listen to the incoming echoes that reflect back after hitting surrounding objects by changing their ear shape (something other mammals cannot do). This allows them to learn about the environment, helping them navigate and hunt in their home of dense forests.

While probing the environment, horseshoe bats change their ear shape to modulate the incoming echoes, increasing the information content embedded in the echoes. We believe that this shape change is one of the reasons bats’ sonar exhibit such high performance compared to technical sonar systems of similar size.

To test this, we first built a robotic bat head that mimics the ear shape changes we observed in horseshoe bats.

han1 - bats

Figure 1: Horseshoe bat inspired robotic set-up used to record speech signal

We then recorded speech signals to explore if using shape change, inspired by the bats, could embed direction-dependent dynamic features into speech signals. The potential applications of this could range from improving hearing aid accuracy to helping a machine more-accurately hear – and learn from – sounds in real-world environments.

We compiled a digital dataset of 11 US English speakers from open source speech collections provided by Carnegie Mellon University. The human acoustic utterances were shifted to the ultrasonic domain so our robot could understand and play back the sounds into microphones, while the biomimetic bat head actively moved its ears. The signals at the base of the ears were then translated back to the speech domain to extract the original signal.
This pilot study, performed at IBM Research in collaboration with Virginia Tech, showed that the ear shape change was, in fact, able to significantly modulate the signal and concluded that these changes, like in horseshoe bats, embed dynamic patterns into speech signals.

The dynamically enriched data we explored improved the accuracy of speech recognition. Compared to a traditional system for hearing and recognizing speech in noisy environments, adding structural movement to a complex outer shape surrounding a microphone, mimicking an ear, significantly improved its performance and access to directional information. In the future, this might improve performance in devices operating in difficult hearing scenarios like a busy street in a metropolitan center.

han2

Figure 2: Example of speech signal recorded without and with the dynamic ear. Top row: speech signal without the dynamic ear, Bottom row: speech signal with the dynamic ear

4pEA7 – Acoustic Cloaking Using the Principles of Active Noise Cancellation

Jordan Cheer – j.cheer@soton.ac.uk
Institute of Sound and Vibration Research
University of Southampton
Southampton, UK 

Popular version of paper 4pEA7, “Cancellation, reproduction and cloaking using sound field control”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu

Loudspeakers are synonymous with audio reproduction and are widely used to play sounds people want to hear. Loudspeakers have also been used for the opposite purpose, to attenuate noise that people may not want to hear. Active noise cancellation technology is an example of this, which combines loudspeakers, microphones and digital signal processing to adaptively control unwanted noise sources [1].

More recently, the scientific community has focused attention on controlling and manipulating sound fields to acoustically cloak objects, with the aim of rendering objects acoustically invisible. A new class of engineered materials called metamaterials have already demonstrated this ability [2]. However, acoustic cloaking has also been demonstrated using methods based on both sound field reproduction and active noise cancellation [3]. Despite its demonstration there has been limited research exploring the physical links between acoustic cloaking, active noise cancellation and sound field reproduction. Therefore, we began exploring these links with the aim of developing active acoustic cloaking systems that build on the advanced knowledge of implementing both audio reproduction and active noise cancellation systems.

Acoustic cloaking attempts to control the sound scattered from a solid object. Using a numerical computer simulation, we therefore investigated the physical limits on active acoustic cloaking in the presence of a rigid scattering sphere. The scattering sphere, shown in Figure 1, was surrounded by an array of sources (loudspeakers) used to control the sound field, shown by the black dots surrounding the sphere in the figure. In the first instance we investigated the effect of the scattering sphere on a simple sound field.

Looking at a horizontal slice through the simulated sound field without a scattering object, shown in the second figure, modifications by the presence of the scattering sphere are obvious in comparison to the same slice when the object is present, seen in third figure. Scattering from the sphere distorts the sound field, rendering it acoustically visible.

figure1 - Acoustic Cloaking

Figure 1 – The geometry of the rigid scattering sphere and the array of sources, or loudspeakers used to control the sound field (black dots).

figure2 - Acoustic Cloaking

Figure 2 – The sound field due to an acoustic plane wave in the free field (without scattering).

figure3 - Acoustic Cloaking

Figure 3 – The sound field produced when an acoustic plane wave is incident on the rigid scattering sphere.

figure4 - Acoustic Cloaking

Figure 4 – The sound field produced when active acoustic cloaking is used to attempt to cancel the sound field scattered by a rigid scattering sphere and thus render the scattering sphere  acoustically ‘invisible’.

To understand the physical limitations on controlling this sound field, and thus implementing an active acoustic cloak, we investigated the ability of the array of loudspeakers surrounding the scattering sphere to achieve acoustic cloaking [4]. In comparison to active noise cancellation, rather than attempting to cancel the total sound field, we only attempted to control the scattered component of the sound field and thus render the sphere acoustically invisible.

With active acoustic cloaking, the sound field appears undisturbed, where the scattered component has been significantly attenuated and results in a field, shown in the fourth figure, that is indistinguishable from the object-less simulation of the Figure 2.

Our results indicate active acoustic attenuation can be achieved using an array of loudspeakers surrounding a sphere that would otherwise scatter sound detectably. In this and related work[4], further investigations showed that the performance of active acoustic cloaking is most effective when the loudspeakers are in close proximity to the object being cloaked. This may lead to design concepts involving acoustic sources embedded in objects for acoustic cloaking or control of the scattered sound field.

Future work will attempt to demonstrate the performance of active acoustic cloaking experimentally and overcome significant challenges of not only controlling the scattered sound field, but detecting it using an array of microphones.

[1]   P. Nelson and S. J. Elliott, Active Control of Sound, 436 (Academic Press, London) (1992).

[2]   L. Zigoneanu, B.I. Popa, and S.A. Cummer, “Three-dimensional broadband omnidirectional acoustic ground cloak”. Nat. Mater, 13(4), 352-355, (2014).

[3]   E. Friot and C. Bordier, “Real-time active suppression of scattered acoustic radiation”, J. Sound Vib., 278, 563–580 (2004).

[4]   J. Cheer, “Active control of scattered acoustic fields: Cancellation, reproduction and cloaking”, J. Acoust. Soc. Am., 140 (3), 1502-1512 (2016).