5aSC43 – Appropriateness of acoustic characteristics on perception of disaster warnings

Naomi Ogasawara – naomi-o@mail.gpwu.ac.jp
Kenta Ofuji – o-fu@u-aizu.ac.jp
Akari Harada

Popular version of paper, 5aSC43, “Appropriateness of acoustic characteristics on perception of disaster warnings.”
Presented Friday morning, December 2, 2016
172nd ASA Meeting, Honolulu

As you might know, Japan has often been hit by natural disasters, such as typhoons, earthquakes, flooding, landslides, and volcanic eruptions. According to the Japan Institute of Country-ology and Engineering [1], 20.5% of all the M6 and greater earthquakes in the world occurred in Japan, and 0.3% of deaths caused by natural disasters worldwide were in Japan. These numbers seem quite high compared with the fact that Japan occupies only 0.28% of the world’s land mass.

Municipalities in Japan issue and announce evacuation calls to local residents through the community wireless system or home receiver when a disaster is approaching; however, there have been many cases reported in which people did not evacuate even after they heard the warnings [2]. This is because people tend to not believe and disregard warnings due to a normalcy bias [3]. Facing this reality, it is necessary to find a way to make evacuation calls more effective and trustworthy. This study focused on the influence of acoustic characteristics (voice gender, pitch, and speaking rate) of a warning call on the listeners’ perception of the call and tried to make suggestions for better communication.

Three short warnings were created:

  1. Kyoo wa ame ga furimasu. Kasa wo motte dekakete kudasai. ‘It’s going to rain today. Please take an umbrella with you.’
  2. Ookina tsunami ga kimasu. Tadachini hinan shitekudasai. ‘A big tsunami is coming. Please evacuate immediately.’ and
  3. Gakekuzure no kiken ga arimasu. Tadachini hinan shitekudasai. ‘There is a risk of landslide. Please evacuate immediately.’

A female and a male native speaker of Japanese, who both have relatively clear voices and good articulation, read the warnings out aloud at a normal speed (see Table 1 for the acoustic information of the utterances), and their utterances were recorded in a sound attenuated booth with a high quality microphone and recording device. Each of the female and male utterances was modified using the acoustic analysis software PRAAT [4] to create stimuli with 20% higher or lower pitch and 20% faster or slower speech rate. The total number of tokens created was 54 (3 warning types x 2 genders x 3 pitch levels x 3 speech rates), but only 4 of the warning 1) tokens were used in the perception experiment as practice stimuli.

oga1

Table 1: Acoustic Data of Normal Tokens

34 university students listened to each stimulus through the two speakers placed on the right and left front corners in a classroom (930cm x 1,500cm). Another group of 42 students and 11 people from the public listened to the same stimuli through one speaker placed on the front in a lab (510cm x 750cm). All of the participants rated each token on 1-to-5 scale (1: lowest, 5: highest) in terms of Intelligibility, Reliability, and Urgency.

Figure 1 summarizes the evaluation responses (n=87) in a bar chart, with the average scores calculated from the ratings on a 1-5 scale for each combination of the acoustic conditions. Taking Intelligibility, for example, the average score was the highest when the calls were spoken with a female voice, with normal speed and normal pitch. Similar results are seen for Reliability as well. On the other hand, respondents felt a higher degree of Urgency for both faster speed and higher pitch.

oga2

Figure 1.  Evaluation responses (bar graph, in percent) and Average scores (data labels and line graph on 1 – 5 scale)

The data were then analyzed with an analysis of variance (ANOVA, Table 2). Figure 2 illustrates the same results as bar charts. It was confirmed that for all of Intelligibility, Reliability, and Urgency, the main effect of speaking speed was the most dominant. In particular, Urgency can be influenced by the speed factor alone by up to 43%.

oga3

Table 2: ANOVA results

oga4

Figure 2: Decomposed variances in stacked bar charts based on the ANOVA results

Finally, we calculated the expected average evaluation scores, with respect to different levels of speed, to find out how much influence speed has on Urgency, with a female speaker and normal pitch (Figure 3). Indeed, by setting speed to fast, the perceived Urgency can be raised to the highest level, even at the expense of Intelligibility and Reliability to some degrees. Based on these results, we argue that the speech rate may effectively be varied depending on the purpose of an evacuation call, whether it prioritizes Urgency, or Intelligibility and Reliability.

oga5

Figure 3: Expected average evaluation scores on 1-5 scale, setting female voice and normal pitch

References

  1. Japan Institute of Country-ology and Engineering (2015). Kokudo wo shiru [To know the national land]. Retrieved from: http://www.jice.or.jp/knowledge/japan/commentary09.
  2. Nakamura, Isao. (2008). Dai 6 sho Hinan to joho, dai 3 setsu Hinan to jyuumin no shinri [Chapter 6 Evacuation and Information, Section 3 Evacuation and Residents’ Mind]. In H. Yoshii & A. Tanaka (Eds.), Saigai kiki kanriron nyuumon [Introduction to Disaster Management Theory] (pp.170-176). Tokyo: Kobundo.
  3. Drabek, Thomas E. (1986). Human System Responses to Disaster: An Inventory of Sociological Findings. NY: Springer-Verlag New York Inc.
  4. Boersma, Paul & Weenink, David (2013). Praat: doing phonetics by computer [Computer program]. Retrieved from: http://www.fon.hum.uva.nl/praat/.

Tags:
-Emergency warnings/response
-Natural disasters
-Broadcasting
-Speech rate
-Pitch

4pBA1 – Kidney stone pushing and trapping using focused ultrasound beams of different structure

Oleg Sapozhnikov – olegs@apl.washington.edu
Mike Bailey – bailey@apl.washigton.edu
Adam Maxwell – amax38@uw.edu

Physics Faculty
Moscow State Univerity
Moscow
RUSSIAN FEDERATION

Center for Industrial and Medical Ultrasound
Applied Physics Laboratory
University of Washington
Seattle, WashingtonUNITED STATES

Popular version of paper 4pBA1, “Kidney stone pushing and trapping using focused ultrasound beams of different structure.”
Presented Thursday afternoon, December 1, 2016 at 1:00pmHAST.
172nd ASA Meeting, Honolulu

Urinary stones (such as kidney or bladder stones) are an important health care problem. One in 11 Americans now has urinary stone disease (USD), and the prevalence is increasing. According to a 2012 report from the National Institute of Diabetes and Kidney and Digestive Diseases (Urological Diseases in America), the direct medical cost of USD in the United States is $10 billion annually, making it the most expensive urologic condition.

Our lab is working to develop more effective and more efficient ways to treat stones. Existing treatments such as shock wave lithotripsy or ureteroscopy are minimally invasive, but can leave behind stone fragments that remain in the kidney and potentially regrow into larger stones over time. We have successfully developed and demonstrated the use of ultrasound to noninvasively move stones in the kidney of human subjects. This technology, called ultrasonic propulsion (UP), uses ultrasound to apply a directional force to the stone, propelling it in a direction away from the sonic source, or transducer. Some stones need to be moved towards the ureteropelvic junction (the exit from the kidney that allows stones to pass through the ureter to the bladder) to aid their passage. In other cases, this technology may be useful to relieve an obstruction caused by a stone that may just need a nudge or a rotation to pass, or at least to allow urine to flow and decompress the kidney.

While UP is able to help stones pass, it is limited in how the stones can be manipulated by an ultrasound transducer in contact with the skin from outside the body. Some applications require the stone to be moved sideways or towards the transducer rather than away from it.

To achieve more versatile manipulation of stones, we are developing a new strategy to effectively trap a stone in an ultrasound beam. Acoustic trapping has been explored by several other researchers, particularly for trapping and manipulating cells, bubbles, droplets, and particles much smaller than length of the sound wave. Different configurations have been used to trap particles in standing waves and focused fields. By trapping the stone in an ultrasound beam, we can then move the transducer or electronically steer the beam to move the stone with it.

sapozhnikov1 Kidney stone

Figure 1. The cross section at the focus for the ultrasound pressure of a vortex beam. The pressure magnitude (left) has a donut-shape distribution, whereas the phase (right) has a spiral-shape structure. A stone can be trapped at the center of the ring.

In this work, we accomplished trapping through the use of vortex beams. Typical focused beams create a single region of high ultrasound pressure, producing radiation force away from the transducer. Vortex beams, on the other hand, are focused beams that create a tube-shaped intensity pattern with a spiraling wave front (Fig. 1). The ultrasound pressure in the middle is very low, while the pressure around the center is high. The result is that there is a component of the ultrasound radiation force pushing the stone towards the center and trapping it in the middle of the beam. In addition to trapping, such a beam can apply a torque to the stone and can rotate it.

To test this idea, we simulated the radiation force on spheres of different materials (including stones) to determine how each would respond in a vortex beam. An example is shown in Fig 2. A lateral-axial cross section of the beam is displayed, with a spherical stone off-center in the tube-shaped beam. The red arrow shows that the force on the sphere is away from the center because the stone is outside of the vortex. Once the center of the stone crosses the peak, the force is directed inward. Usually, there is also some force away from the transducer still, but the object can be trapped against a surface.

Figure 2. The simulated tubular field such as occurs in a vortex beam and its force on a stone. In this simulation the transducer is on the left and the ultrasound propagates to the right. The arrow indicates the force which depends on the position of the stone.

We also built transducers and electrically excited them to generate the vortex in experiments. At first, we used the vortex to trap, rotate, and drag an object on the water surface (Fig. 3). By changing the charge of the vortex beam (the rate of spiraling generated by the transducer), we controlled the diameter of the vortex beam, as well as the direction and speed at which the objects rotated. We also tested manipulation of objects placed deep in a water tank. Glass or plastic beads and kidney stones placed on a platform of tissue-mimicking material. By physically shifting the transducer, we were able to move these objects a specified distance and direction along the platform (Fig 4). These results are best seen in videos at apl.uw.edu/pushingstones.

Figure 3. A small object trapped in the center of a vortex beam on the water surface. The ring-shaped impression due to radiation force on the surface can be seen. The phase difference between each sector element of the transducer affects the diameter of the beam and the spin rate. The 2 mm plastic object floating on the surface is made to rotate by the vortex beam.

Figure 4. A focused vortex beam transducer in water (shown on the top) traps one of the styrofoam beads (shown in the bottom) and translates it in lateral direction.

We have since worked on developing vortex beams with a 256-element focused array transducer. Our complex array can electronically move the beam and drag the stone without physically moving the transducer. In a highly focused transducer, such as our array, sound can even be focused beyond the stone to generate an axial high pressure spot to help trap a stone axially or even pull the stone toward the transducer.

There are several ways in which this technology might be useful for kidney stones. In some cases, it might be employed in gathering small particles together and moving them collectively, holding a larger stone in place for fragmentation techniques such as lithotripsy, sweeping a stone when the anatomy inhibits direct radiation force away from the transducer, or, as addressed here dragging or pulling a stone. In the future, we expect to continue developing phased array technology to more controllably manipulate stones. We are also working to develop and validate new beam profiles, and electronic controls to remotely gather the stone and move it to a new location. We expect that this sort of tractor beam could also have applications in manufacturing, such as ever shrinking electronics, and even in space.

This work was supported by RBBR 14-02- 00426, NIH NIDDK DK43881 and DK104854, and NSBRI through NASA NCC 9-58.

References

  1. O.A. Sapozhnikov and M.R. Bailey. Radiation force of an arbitrary acoustic beam on an elastic sphere in a fluid. – J. Acoust. Soc. Am., 2013, v. 133, no. 2, pp. 661-676.
  2. A.V. Nikolaeva, S.A. Tsysar, and O.A. Sapozhnikov. Measuring the radiation force of megahertz ultrasound acting on a solid spherical scatterer. – Acoustical Physics, 2016, v. 62, no. 1, pp. 38-45.
  3. J.D. Harper, B.W. Cunitz, B. Dunmire, F.C. Lee, M.D. Sorensen, R.S. Hsi, J. Thiel, H. Wessells, J.E. Lingeman, and M.R. Bailey. First in human clinical trial of ultrasonic propulsion of kidney stones. – J. Urology, 2016, v. 195, no. 4 (Part 1), pp. 956–964.

5aEA2 – What Does Your Signature Sound Like?

Daichi Asakura – asakura@pa.info.mie-u.ac.jp
Mie University
Tsu, Mie, Japan

Popular version of poster, 5aEA2. “Writer recognition with a sound in hand-writing”
172nd ASA Meeting, Honolulu

We can notice a car approaching by noise it makes on the road or can recognize a person by the sound of their footsteps. There are many studies analyzing and recognizing these noises. In the computer security industry, studies have even been proposed to estimate what is being typed from the sound of typing on the keyboard [1] and extracting RSA keys through noises made by a PC [2].

Of course, there is a relationship between a noise and its cause and that noise, therefore, contains information. The sound of a person writing, or “hand writing sound,” is one of the noises in our everyday environment. Previous studies have addressed the recognition of handwritten numeric characters by using the resulting sound, finding an average recognition of 88.4%. Based on this study, we seek the possibility of recognizing and identifying a writer by using the sound of their handwriting. If accurate identification is possible, it could become a method of signature verification without having to ever look at the signature.

We used the handwriting sounds of nine participants, conducting recognition experiments. We asked them to write the same text, which were names in Kanji, the Chinese characters, under several different conditions, such as writing slowly or writing on a different day. Figure 1 shows an example of a spectrogram of the hand-writing sound we analyzed. The bottom axis represents time and the vertical axis shows frequency. Colors represent the magnitude – or intensity – of the frequencies, where red indicates high intensity and blue is low.
handwriting

The spectrogram showed features corresponding to the number of strokes in the Kanji. We used a recognition system based on a hidden Markov model (HMM) – typically used for speech recognition –, which represents transitions of spectral patterns as they evolve in time. The results showed an average identification rate of 66.3%, indicating that writer identification is possible in this manner. However, the identification rate decreased under certain conditions, especially a slow writing speed.

To improve performances, we need to increase the number of hand writing samples and include various written texts as well as participants. We also intend to include writing of English characters and numbers. We expect that Deep Learning, which is attracting increasing attention around the world, will also help us achieve a higher recognition rate in future experiments.

 

  1. Zhuang, L., Zhou, F., and Tygar, J. D., Keyboard Acoustic Emanations Revisited, ACM Transactions on Information and Systems Security, 2009, vol.13, no.1, article 3, pp.1-26.
  2. Genkin, D., Shamir, A., and Tromer, E., RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis, Proceedings of CRYPTO 2014, 2014, pp.444-461.
  3. Kitano, S., Nishino, T. and Naruse, H., Handwritten digit recognition from writing sound using HMM, 2013, Technical Report of the Institute of Electronics, Information and Communication Engineers, vol.113, no.346, pp.121-125.

3pAB4 – Automatic classification of fish sounds for environmental purposes

Marielle MALFANTE – marielle.malfante@gipsa-lab.grenoble-inp.fr
Jérôme MARS – jerome.mars@gipsa-lab.grenoble-inp.fr
Mauro DALLA MURA – mauro.dalla-mura@gipsa-lab.grenoble-inp.fr
Cédric GERVAISE – cedric.gervaise@gipsa-lab.grenoble-inp.fr

GIPSA-Lab
Université Grenoble Alpes (UGA)
11 rue des Mathématiques
38402 Saint Martin d’Hères (GRENOBLE)
FRANCE

Popular version of paper 3pAB4 “Automatic fish sounds classification”
Presented Wednesday afternoon, May 25, 2016, 2:15 in Salon I
171st ASA Meeting, Salt Lake City

In the current context of global warming and environmental concern, we need tools to evaluate and monitor the evolution of our environment. The evolution of animal populations is of a special concern in order to prevent changes of behaviour under environmental stress and to preserve biodiversity. Monitoring animal populations however, can be a complex and costly task. Experts can either (1) monitor animal populations directly on the field, or (2) use sensors to gather data on the field (audio or video recordings, trackers, etc.) and then process those data to retrieve knowledge about the animal population. In both cases the issue is the same: experts are needed and can only process limited quantity of data.

An alternative idea would be to keep using the field sensors but to build software tools in order to automatically process the data, thereby allowing monitoring animal populations on larger geographic areas and for extensive time periods.

The work we present is about automatically monitoring fish populations using audio recordings. Sounds have a better propagation underwater: by recording sounds under the sea we can gather loads of information about the environment and animal species it shelters. Here is an example of such recordings:

Legend: Raw recording of fish sounds, August 2014, Corsica, France.

Regarding fish populations, we distinguish four types of sounds that we call (1) Impulsions, (2) Roars, (3) Drums and (4) Quacks. We can hear them in the previous recording, but here are some extracts with isolated examples:

Legend: Filtered recording of fish sounds to hear Roar between 5s and 13s and Drums between 22s to 29s and 42s to 49s.

Legend: Filtered recording of fish sounds to hear Quacks and Impulsions. Both sounds are quite short (<0.5s) and are heard all along the recording.

However, to make a computer automatically classify a fish sound into one of those four groups is a very complex task. A simple or intuitive task for humans is often extremely complex for a computer, and vice versa. This is because humans and computers process information in different ways. For instance, a computer is very successful at solving complex calculations and at performing repetitive tasks, but it is very difficult to make a computer recognize a car in a picture. Humans however, tend to struggle with complex calculations but can very easily recognise objects in images. How do you explain a computer ‘this is a car’? It has four wheels. But then, how do you know this is a wheel? Well, it has a circular shape. Oh, so this ball is a wheel, isn’t it?

This easy task for a human is very complex for a machine. Scientists found a solution to make a computer understand what we call ‘high-level concepts’ (recognising objects in pictures, understanding speech, etc.). They designed algorithms called Machine Learning. The idea is to give a computer a lot of examples of each concept we want to teach it. For instance, to make a computer recognise a car in a picture, we feed it with many pictures of cars so that it can learn what a car is, and with many pictures without cars so that it can learn what a car is not. Many companies such as Facebook, Google, or Apple use those algorithms for face recognition, speech understanding, individualised advertisement, etc. It works very well.

In our work, we use the same technics to teach a computer to recognize and automatically classify fish sounds. Once those sounds have been classified, we can study their evolutions and see if fish populations behave differently from place to place, or if their behaviours evolve with time. It is also possible to study their density and see if their numbers vary through time.

This work is of a particular interest since to our knowledge, we present the first tool to automatically classify fish sounds. One of the main challenges is to make a sound understandable by a computer.that is to find and extract relevant information in the acoustic signal. By doing that, it gets easier for the computer to understand similarities and differences between all signals and in the end of the day, to be able to predict to which group a sound belongs.

how_to_build_automatic_fish_sounds_classifier
Legend: How to build an automatic fish sounds classifier? Illustration.

4pMU4 – How Well Can a Human Mimic the Sound of a Trumpet?

Ingo R. Titze – ingo.titze@utah.edu

University of Utah
201 Presidents Cir
Salt Lake City, UT

Popular version of paper 4pMU4 “How well can a human mimic the sound of a trumpet?”
Presented Thursday May 26, 2:00 pm, Solitude room
171st ASA Meeting Salt Lake City

Man-made musical instruments are sometimes designed or played to mimic the human voice, and likewise vocalists try to mimic the sounds of man-made instruments.  If flutes and strings accompany a singer, a “brassy” voice is likely to produce mismatches in timbre (tone color or sound quality).  Likewise, a “fluty” voice may not be ideal for a brass accompaniment.  Thus, singers are looking for ways to color their voice with variable timbre.

Acoustically, brass instruments are close cousins of the human voice.  It was discovered prehistorically that sending sound over long distances (to locate, be located, or warn of danger) is made easier when a vibrating sound source is connected to a horn.  It is not known which came first – blowing hollow animal horns or sea shells with pursed and vibrating lips, or cupping the hands to extend the airway for vocalization. In both cases, however, airflow-induced vibration of soft tissue (vocal folds or lips) is enhanced by a tube that resonates the frequencies and radiates them (sends them out) to the listener.

Around 1840, theatrical singing by males went through a revolution.  Men wanted to portray more masculinity and raw emotion with vocal timbre. “Do di Petto”, which is Italien for “C  in chest voice” was introduced by operatic tenor Gilbert Duprez in 1837, which soon became a phenomenon.  A heroic voice in opera took on more of a brass-like quality than a flute-like quality.  Similarly, in the early to mid- twentieth century (1920-1950), female singers were driven by the desire to sing with a richer timbre, one that matched brass and percussion instruments rather than strings or flutes.  Ethel Merman became an icon in this revolution. This led to the theatre belt sound produced by females today, which has much in common with a trumpet sound.

Titze_Fig1_Merman

Fig 1. Mouth opening to head-size ratio for Ethel Merman and corresponding frequency spectrum for the sound “aw” with a fundamental frequency fo (pitch) at 547 Hz and a second harmonic frequency 2 fo at 1094 Hz.

The length of an uncoiled trumpet horn is about 2 meters (including the full length of the valves), whereas the length of a human airway above the glottis (the space between the vocal cords) is only about 17 cm (Fig. 2). The vibrating lips and the vibrating vocal cords can produce similar pitch ranges, but the resonators have vastly different natural frequencies due to the more than 10:1 ratio in airway length.  So, we ask, how can the voice produce a brass-like timbre in a “call” or “belt”?

One structural similarity between the human instrument and the brass instrument is the shape of the airway directly above the glottis, a short and narrow tube formed by the epiglottis.  It corresponds to the mouthpiece of brass instruments.  This mouthpiece plays a major role in shaping the sound quality.  A second structural similarity is created when a singer uses a wide mouth opening, simulating the bell of the trumpet.  With these two structural similarities, the spectrum of tones produced by the two instruments can be quite similar, despite the huge difference in the overall length of the instrument.

Titze_Fig2_airway_ trumpet

Fig 2. Human airway and trumpet (not drawn to scale).

Acoustically, the call or belt-like quality is achieved by strengthening the second harmonic frequency 2fin relation to the fundamental frequency fo.  In the human instrument, this can be done by choosing a bright vowel like /ᴂ/ that puts an airway resonance near the second harmonic.  The fundamental frequency will then have significantly less energy than the second harmonic.

Why does that resonance adjustment produce a brass-like timbre?  To understand this, we first recognize that, in brass-instrument playing, the tones produced by the lips are entrained (synchronized) to the resonance frequencies of the tube.  Thus, the tones heard from the trumpet are the resonance tones. These resonance tones form a harmonic series, but the fundamental tone in this series is missing.  It is known as the pedal tone.  Thus, by design, the trumpet has a strong second harmonic frequency with a missing fundamental frequency.

Perceptually, an imaginary fundamental frequency may be produced by our auditory system when a series of higher harmonics (equally spaced overtones) is heard.  Thus, the fundamental (pedal tone) may be perceptually present to some degree, but the highly dominant second harmonic determines the note that is played.

In belting and loud calling, the fundamental is not eliminated, but suppressed relative to the second harmonic.  The timbre of belt is related to the timbre of a trumpet due to this lack of energy in the fundamental frequency.  There is a limit, however, in how high the pitch can be raised with this timbre.  As pitch goes up, the first resonance of the airway has to be raised higher and higher to maintain the strong second harmonic.  This requires ever more mouth opening, literally creating a trumpet bell (Fig. 3).

Titze_Fig3_Menzel

Fig 3. Mouth opening to head-size ratio for Idina Menzel and corresponding frequency spectrum for a belt sound with a fundamental frequency (pitch) at 545 Hz.

Note the strong second harmonic frequency 2fo in the spectrum of frequencies produced by Idina Menzel, a current musical theatre singer.

One final comment about the perceived pitch of a belt sound is in order.  Pitch perception is not only related to the fundamental frequency, but the entire spectrum of frequencies.  The strong second harmonic influences pitch perception. The belt timbre on a D5 (587 Hz) results in a higher pitch perception for most people than a classical soprano sound on the same note. This adds to the excitement of the sound.