2pAOb – Methane in the ocean: observing gas bubbles from afar

Tom Weber – tom.weber@unh.edu
University of New Hampshire
24 Colovos Road
Durham, NH 03824

Popular version of paper 2pAOb
Presented Tuesday Afternoon, November 29, 2016
172nd ASA Meeting, Honolulu

The more we look, the more we find bubbles of methane, a greenhouse gas, leaking from the ocean floor (e.g., [1]). Some of the methane in these gas bubbles may travel to the ocean surface where it enters the atmosphere, and some is consumed by microbes, generating biomass and the greenhouse gas carbon dioxide in the process [2]. Given the vast quantities of methane thought to be contained beneath the ocean seabed [3], understanding how much methane goes where is an important component of understanding climate change and the global carbon cycle.

Fortunately, gas bubbles are really easy to observe acoustically. The gas inside the bubble acts like a very soft-spring compared to the nearly incompressible ocean water surrounding it. If we compress this spring with an acoustic wave, the water surrounding the bubble moves with it as an entrained mass. This simple mass-spring system isn’t conceptually different than the suspension system (the spring) on your car (the mass): driving over a wash-board dirt road at the wrong speed (using the right acoustic frequency) can elicit a very uncomfortable (or loud) response. We try to avoid these conditions in our vehicles, but exploiting the acoustic resonance of a gas bubble helps us detect centimeter-sized (or smaller) bubbles when they are kilometers away (Fig. 1).

weber_figure1 - methane gas bubbles

Figure 1. Top row: observations of methane gas bubbles exiting the ocean floor (picture credit: NOAA OER). The red circle shows methane hydrate (methane ice). Bottom row: acoustic observations of methane gas bubbles rising through the water column.

Methane bubbles rising from the ocean floor undergo a complicated evolution as they rise through the water column: gas is transferred both into and out of the surrounding bubble causing the gas composition of a bubble near the sea surface to look very different than at its ocean floor origin, and coatings on the bubble wall can change both the speed at which the bubble rises as well as the rate at which gas enters or exits the bubble. Understanding the various ways in which methane bubbles contribute to the global carbon cycle requires understanding these complicated details of a methane bubble’s lifetime in the ocean. We can use acoustic remote sensing techniques, combined with our understanding of the acoustic response of resonant bubbles, to help answer the question of where the methane gas goes. In doing so we map the locations of methane gas bubble sources on the seafloor (Fig. 2), measure how high up into the water column we observe gas bubbles rising, and use calibrated acoustic measurements to help constrain models of how bubbles change during their ascent through the water column.

weber_figure2 - methane gas bubbles

Figure 2. A map of acoustically detected methane gas bubble seeps (blue dots) in the northern Gulf of Mexico in water depths of approximately 1000-2000 m. Oil pipelines on the seabed are shown as yellow lines.

Not surprisingly, working on answering these questions generates new questions to answer, including how the acoustic response of large, wobbly bubbles (Fig. 3) differs from small, spherical ones and what the impact of methane hydrate (methane-ice) coatings are on both the fate of the bubbles and the acoustic response. Given how much of the ocean remains unexplored, we expect to be learning about methane gas seeps and their role in our climate for a long time to come.

weber_figure3

Figure 3. Images of large, wobbly bubbles that are approximately 1 cm in size. These type of bubbles are being investigated to help understand how their acoustic response differs from an ideal, spherical bubble. Picture credit: Alex Padilla.

[1] Skarke, A., Ruppel, C., Kodis, M., Brothers, D., & Lobecker, E. (2014). Widespread methane leakage from the sea floor on the northern US Atlantic margin. Nature Geoscience, 7(9), 657-661.

[2] Kessler, J. (2014). Seafloor methane: Atlantic bubble bath. Nature Geoscience, 7(9), 625-626.

[3] Ruppel, C. D. “Methane hydrates and contemporary climate change.” Nature Education Knowledge 3, no. 10 (2011): 29.

3aMU8 – Comparing the Chinese erhu and the European violin using high-speed camera measurements

Florian Pfeifle – Florian.Pfeifle@uni-hamburg.de

Institute of Systematic Musicology
University of Hamburg
Neue Rabenstrasse 13
22765 Hamburg, Germany
Popular version of paper 3aMU8, “Organologic and acoustic similarities of the European violin and the Chinese erhu”
Presented Wednesday morning, November 30, 2016
172nd ASA Meeting, Honolulu

0. Overview and introduction
Have you ever wondered what a violin solo piece like Paganini’s La Campanella would sound like if played on a Chinese erhu, or how an erhu solo performance of Horse Racing, a Mongolian folk song, would sound on a modern violin?

Our work is concerned with the research of acoustic similarities and differences of these two instruments using high-speed camera measurements and piezoelectric pickups to record and quantify the motion and vibrational response of each instrument part individually.
The research question here is, where do acoustic differences between both instruments begin and what are the underlying physical mechanisms responsible?

1. The instruments
The Chinese erhu is the most popular instrument in the bowed string instrument group known as huqin in China. It plays a central role in various kinds of classical music as well as in regional folk music styles.  Figure 1 shows a handcrafted master luthier erhu.  In orchestral and ensemble music its role is comparable to the European violin as it often takes the role as the lead voice instrument.

A handcrafted master luthier erhu. This instrument is used in all of our measurements.

Figure 1. A handcrafted master luthier erhu. This instrument is used in all of our measurements.

In contrast to the violin, the erhu is played in anupright position, resting on the left thigh of the musician. It consists of two strings, as compared to four in the case of the violin. The bow is put between both strings instead of being played from the top as European bowed instruments are usually played. In addition to the difference in bowing technique, the left hand does not stop the strings on a neck but presses the firmly taut strings, thereby changing their freely vibrating length.  A similarity between both instruments is the use of a horse-hair strung bow to excite the strings.  The history of an instrument similar to the erhu is documented from the 11th century onwards, in the case of the violin from the 15th century. The historic development before that time is still not fully known, but there is some consensus between most researchers that bowed lutes have their origin in central Asia, presumably somewhere along the silk road. Early pictorial sources point to a place of origin in an area known as Transoxiana which spanned an area across modern Uzbekistan and Turkmenistan.

Comparing instruments from different cultural spheres and having different backgrounds is a many-faceted problem as there are historical, cultural, structural and musical factors playing an important role in the aesthetic perception of an instrument. Measuring and comparing acoustical features of instruments can be used to objectify this endeavour, at least to a certain degree.  Therefore, the method applied in this paper aims at finding and comparing differences and similarities on an acoustical level, using different data acquisition methods.  The measurement setup is depicted in Figure 2.

Measurement setup for both instrument measurements.

Figure 2. Measurement setup for both instrument measurements.

The vibration of the strings are recorded using a high-speed camera which is able to capture the deflection of bowed strings with a very high frame rate.  An exemplary video of such a measurement is shown in Video 1.

Video 1.  A high-speed recording of a bowed violin string.

The recorded motion of a string can now be tracked with sub-pixel accuracy using a tracking software that traces the trajectory of a defined point on the string. The motion of the bridge is measured by applying a miniature piezoelectric transducer, which converts microscopic motions into measurable electronic signals, to the bridge. We record the radiated instrument sound using a standard measurement microphone which is positioned one meter from the instrument’s main radiating part. This measurement setup results in three different types of data: first only the bowed string without the influence of the body of the instrument; the motion of the bridge and the string; and a recording of the radiated instrument sound under normal playing conditions.

Returning to the initial question, we can now analyze and compare each measurement individually. What is even more exciting, we can combine measurements of the string deflection of one instrument with the response of the other instrument’s body. In this way we can approximate the amount of influence the body has on the sound colour of the instrument and if it is possible to make an erhu performance sound like a violin performance, or vice versa. The following sound files convey an idea of this methodology by combining the string motion of part of an Mongolian folk song played on an erhu with the body of an European violin. Sound-example 1 is a microphone recording of the erhu piece and sound-example 2 is the same recording using only the string measurement combined with an European violin body.  To experience the difference clearly, headphones or reasonably good loudspeakers are recommended.

Audio File 1. A section of an erhusolo piece recorded with a microphone.

Audio File 2. A section of the same erhupiece combining the erhu string measurement with a violin body.

2. Discussion
The results clearly show that the violin body has a noticeable influence on the timbre, or quality, of the piece when compared to the microphone recording of the erhu. But even so, due to the specific tonal quality of the piece itself, it does not sound like a composition from an European tradition. This means that stylistic and expressive idiosyncrasies are easily recognizable and influence the perceived aesthetic of an instrument. The proposed technique could be used to extend the comparison of other instruments, such as plucked lutes like the guitar and pi’pa, or mandolin and ruanxian.

4aPPa24 – Effects of meaningful or meaningless noise on psychological impression for annoyance and selective attention to stimuli during intellectual task

Takahiro Tamesue – tamesue@yamaguchi-u.ac.jp
Yamaguchi University
1677-1 Yoshida, Yamaguchi
Yamaguchi Prefecture 753-8511
Japan

Popular version of poster 4aPPa24, “Effects of meaningful or meaningless noise on psychological impression for annoyance and selective attention to stimuli during intellectual task”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu

Open offices that make effective use of limited space and encourage dialogue, interaction, and collaboration among employees are becoming increasingly common. However, productive work-related conversation might actually decrease the performance of other employees within earshot — more so than other random, meaningless noises. When carrying out intellectual activities involving memory or arithmetic tasks, it is a common experience for noise to cause an increased psychological impression of “annoyance,” leading to a decline in performance. This is more apparent for meaningful noise, such as conversation, than it is for other random, meaningless noise. In this study, the impact of meaningless and meaningful noises on selective attention and cognitive performance in volunteers, as well as the degree of subjective annoyance of those noises, were investigated through physiological and psychological experiments.

The experiments were based on the so-called “odd-ball” paradigm — a test used to examine selective attention and information processing ability. In the odd-ball paradigm, subjects detect and count rare target events embedded in a series of repetitive events. To complete the odd-ball task it is necessary to regulate attention to a stimulus. In one trial, subjects had to count the number of times the infrequent target sounds occurred under meaningless or meaningful noises over a 10 minute period. The infrequent sound — appearing 20% of the time—was a 2 kHz tone burst; the frequent sound was a 1 kHz tone burst. In a visual odd-ball test, subjects observed pictures flashing on a PC monitor as meaningless or meaningful sounds were played to both ears through headphones. The most infrequent image was 10 x 10 centimeter-squared red image; the most frequent was a green square. At the end of the trial, the subjects also rated their level of annoyance at each sound on a seven-point scale.

During the experiments, the subjects brain waves were measured through electrodes placed on their scalp. In particular, we look at what is called, “event-related potentials,” very small voltages generated in the brain structures in response to specific events or stimuli that generate electroencephalograph waveforms. Example results, after appropriate averaging, of wave forms of event-related potentials under no external noise are shown in Figure 1. The so-called N100 component peaks negatively about 100 milliseconds after the stimulus and the P300 component positive peaks positively around 300 milliseconds after a stimulus, related to selective attention and working memory. Figure 2 and 3 show the results of event-related potentials for infrequent sound under the meaningless and meaningful noise. N100 and P300 components are smaller in amplitude and longer in latency because of the meaningful noise compared to the meaningless noise.

tamesue1Figure 1. Averaged wave forms of evoked Event-related potentials for infrequent sound under no external noise. tamesue2Figure 2. Averaged wave forms of evoked Event-related potentials for infrequent sound under meaningless noise.
tamesue3Figure 3. Averaged wave forms of auditory evoked Event-related potentials under meaningful noise.  

We employed a statistical method called, “principal component analysis” to identify the latent components. Results of statistical analysis, where four principal components were extracted as shown in Figure 4. Considering the results, where component scores of meaningful noise was smaller than other noise conditions, meaningful noise reduces the component of event-related potentials. Thus, selective attention to cognitive tasks was influenced by the degree of meaningfulness of the noise.

tamesue4Figure 4. Loadings of principal component analysis tamesue5Figure 5. Subjective experience of annoyance (Auditory odd-ball paradigms)

Figure 5 shows the results for annoyance in the auditory odd-ball paradigms. These results demonstrated that the subjective experience of annoyance in response to noise increased due to the meaningfulness of the noise. The results revealed that whether the noise is meaningless or meaningful had a strong influence not only on the selective attention to auditory stimuli in cognitive tasks, but also the subjective experience of annoyance.

That means that when designing sound environments in spaces used for cognitive tasks, such as the workplace or schools, it is appropriate to consider not only the sound level, but also meaningfulness of the noise that is likely to be present. Surrounding conversations often disturb the business operations conducted in such open offices. Because it is difficult to soundproof an open office, a way to mask meaningful speech with some other sound would be of great benefit for achieving a comfortable sound environment.

2pABa1 – Snap chat: listening in on the peculiar acoustic patterns of snapping shrimp, the noisiest animals on the reef

Ashlee Lillis – ashlee@whoi.edu
T. Aran Mooney – amooney@whoi.edu

Marine Research Facility
Woods Hole Oceanographic Institution
266 Woods Hole Road
Woods Hole, MA 02543

Popular version of paper 2pABa1
Presented Tuesday afternoon, November 29, 2016
172nd ASA Meeting, Honolulu

Characteristic soundscape recorded on a coral reef in St. John, US Virgin Islands. The conspicuous crackle is produced by many tiny snapping shrimp.

Put your head underwater in almost any tropical or sub-tropical coastal area and you will hear a continuous, static-like noise filling the water. The source of this ubiquitous sizzling sound found in shallow-water marine environments around the world was long considered a mystery of the sea. It wasn’t until WWII investigations of this underwater sound, considered troublesome, that hidden colonies of a type of small shrimp were discovered as the cause of the pervasive crackling sounds (Johnson et al., 1947).

Individual snapping shrimp (Figure 1), sometimes referred to as pistol shrimp, measure smaller than a few centimeters, but produce one of the loudest of all sounds in nature using a specialized snapping claw. The high intensity sound is actually the result of a bubble popping when the claw is closed at incredibly high speed, creating not only the characteristic “snap” sound but also a flash of light and extremely high temperature, all in a fraction of a millisecond (Versluis et al., 2000). Because these shrimp form large, dense aggregations, living unseen within reefs and rocky habitats, the combination of individual snaps creates the consistent crackling sound familiar to mariners. Snapping is used by shrimp for defense and territorial interactions, but likely serves other unknown functions based on our recent studies.

snapping shrimp snapping shrimp

Figure 1. Images of the species of snapping shrimp, Alpheus heterochaelis, we are using to test hypotheses in the lab. This is the dominant species of snapping shrimp found coastally in the Southeast United States, but there are hundreds of different species worldwide, easily identified by their relatively large snapping claw.

Since snapping shrimp produce the dominant sound in many marine regions, changes in their activity or population substantially alters ambient sound levels at a given location or time. This means that the behavior of snapping shrimp exerts an outsized influence on the sensory environment for a variety of marine animals, and has implications for the use of underwater sound by humans (e.g., harbor defense, submarine detection). Despite this fundamental contribution to the acoustic environment of temperate and coral reefs, relatively little is known about snapping shrimp sound patterns, and the underlying behaviors or environmental influences. So essentially, we ask the question: what is all the snapping about?

Figure 2 (missing). Photo showing an underwater acoustic recorder deployed in a coral reef setting. Recorders can be left to record sound samples at scheduled times (e.g. every 10 minutes) so that we can examine the long-term temporal trends in snapping shrimp acoustic activity on the reef.

Recent advances in underwater recording technology and interest in passive acoustic monitoring have aided our efforts to sample marine soundscapes more thoroughly (Figure 2), and we are discovering complex dynamics in snapping shrimp sound production. We collected long-term underwater recordings in several Caribbean coral reef systems and analyzed the snapping shrimp snap rates. Our soundscape data show that snap rates generally exhibit daily rhythms (Figure 3), but that these rhythms can vary over short spatial scales (e.g., opposite patterns between nearby reefs) and shift substantially over time (e.g., daytime versus nighttime snapping during different seasons). These acoustic patterns relate to environmental variables such as temperature, light, and dissolved oxygen, as well as individual shrimp behaviors themselves.

lillis3 snapping shrimp
Figure 3. Time-series of snap rates detected on two nearby USVI coral reefs for a week-long recording period. Snapping shrimp were previously thought to consistently snap more during the night, but we found in this study location that shrimp were more active during the day, with strong dawn and dusk peaks at one of the sites. This pattern conflicts with what little is known about snapping behaviors and is motivating further studies of why they snap.

The relationships between environment, behaviors, and sound production by snapping shrimp are really only beginning to be explored. By listening in on coral reefs, our work is uncovering intriguing patterns that suggest a far more complex picture of the role of snapping shrimp in these ecosystems, as well as the role of snapping for the shrimp themselves. Learning more about the diverse habits and lifestyles of snapping shrimp species is critical to better predicting and understanding variation in this dominant sound source, and has far-reaching implications for marine ecosystems and human applications of underwater sound.

References

Johnson, M. W., F. Alton Everest, and Young, R. W. (1947). “The role of snapping shrimp (Crangon and Synalpheus) in the production of underwater noise in the sea,” Biol. Bull. 93, 122–138.

Versluis, M., Schmitz, B., von der Heydt, A., and Lohse, D. (2000). “How snapping shrimp snap: through cavitating bubbles,” Science, 289, 2114–2117. doi:10.1126/science.289.5487.2114

1aSA – On a Fire Extinguisher with Sound-wind for the Beginning Stage of Fire

Myung-Jin Bae, mjbae@ssu.ac.kr
Myung-Sook Kim, kimm@ssu.ac.kr
Soongil University, 369 Sangdo-ro, Dongjak-gu, 06978 Seoul Korea

Popular version of 1aSA “On a fire extinguisher using sound winds”
Presented 10:30 AM – 12:00 PM., November 28, 2016.
172nd ASA Meeting, Honolulu, U.S.A.
Click here to read the abstract

There are a variety of fire extinguishers available on the market with differing extinguishing methods, including powder-dispersers, fluid-dispersers, gas-dispersers and water-dispersers. There has been little advancement in the technology of fire extinguishers in the past 50 years. Yet, issues may arise when using any of these types of extinguishers during an emergency that hinder its smooth implementation. For example, powder, fluid, or gas can solidify and become stuck inside of containers; or batteries can discharge due to neglected management. This leaves a need for developing a new kind of fire extinguisher that will operated reliably at the beginning stage of fire without risk of faulting. The answer may be the sound fire extinguisher.

The sound fire extinguisher has been in development since the DAPRA, Defense Advanced Research Projects Agency of the United States, publicized the result of its project in 2012, suggesting that a fire can be put out by surrounding it with two large sound speakers. Speakers were enormously large in size then because they needed to create enough sound power to extinguish fire. As a follow-up, in 2015 American graduate students introduced a portable sound extinguisher and demonstrated it with a video posted on YouTube. But it still required heavy equipment, weighing 9 kilograms, was relatively weak in power and had long cables. In August of 2015, we, the Sori Sound Engineering Research Institute (SSERI), introduced an improved device, a sound extinguisher using a sound lens in a speaker to produce more focused power of sound, roughly 10 times stronger in its power than the device presented in the YouTube video.

Our device still exhibited problems, such as its heavy weight over 2.5 kilograms, and its obligatory vicinity to the flame. Here we introduces a further improved sound extinguisher in order to increase the efficiency rate of the device by utilizing the sound-wind. As illustrated in Figures 1 and 2 below, the sound fire extinguishers do not use any water or chemical fluids as do conventional extinguishers, only emitting sound. When the sound extinguisher produces low frequency sound of 100 Hz, its vibration energy touches the flame, scatters its membrane, and blocks the influx of oxygen and subdues the flame.

The first version of the extinguisher, where a sound lens in a speaker produced roughly 10 times more power with focusing, introduced by the research team of SSERI is shown in Figure 1. It was relatively light, weighing only 2.5 kilograms and 1/3 the weight of previous ones, and thus could be carried around with one hand without any connecting cables. It was also small in size measuring 40 centimeters (a little more than 1 feet) in length. With an easy on-off switch, it is trivial to operate up to 1 or 2 meters (about 1 yard) distance from the flame. It can be continuously used for one hour when fully charged.

The further improved version of the sound fire extinguisher is shown in Figure 2. The most important improvement to be found in our new fire extinguisher is the utilization of wind. As we blow out candles using the air from our mouth, similarly the fire can be put out by wind if its speed is over 5 meters/second when it reaches the flame. In order to acquire the power and speed required to put out the fire, we developed a way to increase the speed of wind by using low-powered speakers: a method of magnifying the power of sound wind.

fire extinguisher
Figure 1. The first sound fire extinguisher by SSERI: the mop type.

fire extinguisher
Figure 2. The improved extinguisher by SSERI: the portable type

Wind generally creates white noise, but we covered wind with particular sound frequencies. When wind acquires certain sound frequency, namely, its resonance frequency, its amplitude magnifies it and creates a larger sound-wind. Figure 3 below illustrates the mechanism of a fire extinguisher with sound-wind amplifier. A speaker produces the low frequency sound (100 Hz and below) and creates sound-wind, resonates it by utilizing the horn-effect to magnify and produce 15 times more power. The magnified sound-wind touches the flame and instantly put out the fire.

In summary, with these improvements, the sound-wind extinguisher is fit best for the beginning stage of a fire. It can be used at home, at work, on board in aircrafts, vessels, and cars. In the future, we will continue efforts to further improve the functions of the sound-wind fire extinguisher so that it can be available for a popular use.

fire extinguisher
Figure 3: The mechanism of a sound-wind fire extinguisher

References
[1] DAPRA Demonstration, https://www.youtube.com/watch?v=DanOeC2EpeA
[2] American graduate students (George Mason Univ.), https://www.youtube.com/watch?v=uPVQMZ4ikvM
[3] Park, S.Y., Yeo, K.S., Bae, M.J. “On a Detection of Optimal Frequency for Candle Fire-extinguishing,” ASK, Proceedings of 2015 Fall Conference of ASK, Vol. 34, No. 2(s), pp. 32, No. 13, Nov. 2015.
[4] Ik-Soo Ahn, Hyung-Woo Park, Seong-Geon Bae, Myung-Jin Bae,“ A Study on a sound fire extinguisher using special sound lens,” Acoustical Society of America, Journal of ASA, Vol.139, No.4, pp.2077, April 2016.

*Video file attached: sound-wind extinguisher V2

2pSC – How do narration experts provide expressive storytelling in Japanese fairy tales?

Takashi Saito – saito@sc.shonan-it.ac.jp
Shonan Institute of Technology
1-1-25 Tsujido-Nishikaigan,
Fujisawa, Kanagawa, JAPAN

Popular version of paper 2pSC, “Prosodic analysis of storytelling speech in Japanese fairy tale”
Presented Tuesday afternoon, November 29, 2016
172nd ASA Meeting, Honolulu

Recent advances in speech synthesis technologies bring us relatively high quality synthetic speech, as smartphones today often provide it with speech message output. The acoustic sound quality especially seems to sometimes be particularly close to that of human voices. Prosodic aspects, or the patterns of rhythm and intonation, however, still have large room for improvement. The overall speech messages generated by speech synthesis systems sound somewhat awkward and monotonous. In other words, those messages lack expressiveness of speech compared with human speech. One of the reasons for this is that most systems use a one-sentence speech synthesis scheme in which each sentence in the message is generated independently, connected just to construct the message. The lack of expressiveness might hinder widening the range of applications for speech synthesis. Storytelling is a typical application to expect speech synthesis to be capable of having a control mechanism beyond just one sentence to provide really vivid and expressive storytelling. This work attempts to investigate the actual storytelling strategies of human narration experts for the purpose of ultimately reflecting them on the expressiveness of speech synthesis.

A Japanese popular fairy tale titled, “The Inch-High Samurai,” in its English translation was the storytelling material in this study. It is a short story taking about six minutes to tell verbally. The story consists of four elements typically found in simple fairy tales: introduction, build-up, climax, and ending. These common features suit the story well for observing prosodic changes in the story’s flow. The story was told by six narration experts (four female and two male narrators) and were recorded. First, we were interested in what they were thinking while telling the story, so we interviewed them on their actual reading strategies after the recording. We found they usually did not adopt fixed reading techniques for each sentence, but tried to go into the world of the story, and make a clear image of characters appearing in the story, as would an actor. They also reported paying attention to the following aspects of the scenes associated with the story elements: In the introduction, featuring the birth of the little Samurai character, they started to speak slowly and gently in effort to grasp the hearts of listeners. In the story’s climax, depicting the extermination of the devil character, they tried to express a tense feeling through a quick rhythm and tempo. Finally, in the ending, they gradually changed their reading styles to make the audience understand that the happy ending is coming soon.

For all six speakers a baseline speech segmentation was conducted for words, and accentual phrases in a semi-automatic way. We then used a multi-layered prosodic tagging method, performed manually, to provide information on various changes of “story states” relevant to impersonation, emotional involvement and scene flow control. Figure 1 shows an example of the labeled speech data. Wavesurfer [1] software served as our speech visualization and labelling tool. The example utterance contains a part of the storyteller’s speech (containing the phrase “oniwa bikkuridesu” meaning, “the devil was surprised,” and devil’s part, “ta ta tasukekuree,” meaning, “please help me!”) and is shown in the top label pane for characters (chrlab). The second top label pane (evelab) shows event labels such as scene changes and emotional involvement (desire, joy, fear, etc…). In this example, a “fear” event is attached to the devil’s utterance part. The dynamic pitch movement can be observed in the pitch contour pane located at the bottom of the figure.

segmentedspeechsample

How are the events of scene change or emotional involvement provided by human narrators manifested in speech data? Prosodic parameters of speed, measured in speech rate or mora/sec; pitch, measured in Hz; power, measured in dB; and preceding pause length, measured in seconds, are investigated for all the breath groups in the speech data. Breath group refers to a speech segment which is uttered consecutively without pausing. Figure 2, 3 and 4 show these parameters at a scene-change event (Figure 2), desire event (Figure 3), and fear event (Figure 4). The axis on the left of the figures shows the ratio of the parameter to its average value. Each event has its own distinct tendency on prosodic parameters, also seen in the figures, which seems to be fairly common to all speakers. For instance, the differences between the scene-change event and the desire event are the amount of preceding pause and the degree of the contributions from the other three parameters. The fear event shows a quite different tendency from other events, but it is common to all speakers though the degree of the parameter movement differs between speakers. Figure 5 shows how to expresses character differences, when the reader impersonates the story’s characters, with the three parameters. In short, speed and pitch are changed dynamically for impersonation, and this is a common tendency of all speakers.

Based on findings obtained from these human narrations, we are designing a framework of mapping story events through scene changes and emotional involvement to prosodic parameters. Simultaneously, it is necessary to build additional databases to ensure and reinforce story event description and mapping framework.

saito-fig2 saito-fig3
saito-fig4 saito-fig5

[1] Wavesurfer: http://www.speech.kth.se/wavesurfer/