2aSCb3 – How would you sketch a sound with your hands?

Hugo Scurto – Hugo.Scurto@ircam.fr
Guillaume Lemaitre – Guillaume.Lemaitre@ircam.fr
Jules Françoise – Jules.Francoise@ircam.fr
Patrick Susini – Patrick.Susini@ircam.fr
Frédéric Bevilacqua – Frederic.Bevilacqua@ircam.fr
Ircam
1 place Igor Stravinsky
75004 Paris, France

Popular version of paper 2aSCb3, “Combining gestures and vocalizations to imitate sounds”
Presented Tuesday morning, November 3, 2015, 10:30 AM in Grand Ballroom 8
170th ASA Meeting, Jacksonville

Scurto fig 1 - gestures

Figure 1. A person hears the sound of door squeaking and imitates it with vocalizations and gestures. Can the other person understand what he means?

Have you ever listened to an old Car Talk show? Here is what it sounded like on NPR back in 2010:

“So, when you start it up, what kind of noises does it make?
– It just rattles around for about a minute. […]
– Just like budublu-budublu-budublu?
– Yeah! It’s definitely bouncing off something, and then it stops”

As the example illustrates, it is often very complicated to describe a sound with words. But it is really easy to make it with our built-in sound-making system: the voice! In fact, we have observed earlier that this is exactly what people do: when we ask a person to communicate a sound to another person, she will very quickly try to recreate this noise with her voice – and also use a lot of gestures.

And this works! Communicating sounds with voice and gesture is much more effective than describing them with words and sentences. Imitations of sounds are fun, expressive, spontaneous, widespread in human communication, and very effective. These non-linguistic vocal utterances have been little studied, but nevertheless have the potential to provide researchers with new insights into several important questions in domains such as articulatory phonetics and auditory cognition.

The study we are presenting at this ASA meeting is part of a larger European project on how people imitate sounds with voice and gestures: SkAT-VG (“Sketching Audio Technologies with Voice and Gestures”, http://www.skatvg.eu): How do people produce vocal imitations (phonetics)? What are imitations made of (acoustics and gesture analysis)? How do other people interpret them (psychology)? The ultimate goal is to create “sketching” tools for sound designers (the persons that create the sounds of everyday products). If you are an architect and want to sketch a house, you can simply draw it on a sketchpad. But what do you do if you are a sound designer and want to rapidly sketch the sound of a new motorbike? Well, all that is available today are cumbersome pieces of software. Instead, the Skat-VG project aims to offer sound designers new tools that are as intuitive as a sketching pad: simply use their voice and gestures to control complex sound design tools. Therefore, the SkAT-VG project also conducts research in machine learning, sound synthesis, and studies how sound designers work.

Here at the ASA meeting, we are presenting a partial study in which we asked the question: “What do people use gestures for when they imitate a sound?” In fact, people use a lot of gestures, but we do not know what information these gestures convey: Are they redundant with the voice? Do they convey specific pieces of information that the voice cannot represent?

We first collected a huge database of vocal and gestural imitations. Then, we asked 50 participants to come to our lab and make vocal and gestural imitations for several hours. We recorded their voice, filmed them with a high-speed camera, and used a depth camera and accelerometers to measure their gestures. This resulted in a database of about 8000 imitations! This database is an unprecedented amount of material that now allows

We first analyzed the database qualitatively, by watching and annotating the videos. From this analysis, several hypotheses about the combination of gestures and vocalizations were drawn. Then, to test these hypotheses, we asked 20 participants to imitate 25 specially synthesized sounds with their voice and gestures.

The results showed a quantitative advantage of voice over gesture for communicating rhythmic information. Voice can reproduce accurately higher tempos than gestures, and is more precise than gestures when reproducing complex rhythmic patterns. We also found that people often use gestures in a metaphorical way, whereas voice reproduces some acoustic features of the sound. For instance, people shake their hands very rapidly whenever a sound is stable and noisy. This type of gesture does not really follow a feature of the sound: it simply means that the sound is noisy.

Overall, our study reveals the metaphorical function of gestures during sound imitation. Rather than following an acoustic characteristic, gestures expressively emphasize the vocalization and signal the most salient features. These results will inform the specifications of the SkAT-VG tools and make the tools more intuitive.

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
Rsadegh1@binghamton.edu
SUNY at Binghamton
Binghamton, NY

Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

(a) (b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).  

Sadeghian 2_graphs - Dementia

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.

3pBA5 – Using Acoustic Levitation to Understand, Diagnose, and Treat Cancer and Other Diseases

Brian D. Patchett – brian.d.patchett@gmail.com
Natalie C. Sullivan – nhillsullivan@gmail.com
Timothy E. Doyle – Timothy.Doyle@uvu.edu

Department of Physics
Utah Valley University
800 West University Parkway, MS 179
Orem, Utah 84058

Popular version of paper 3pBA5, “Acoustic Levitation Device for Probing Biological Cells With High-Frequency Ultrasound”
Presented Wednesday afternoon, November 4, 2015
170th ASA Meeting, Jacksonville

Imagine a new medical advancement that would allow scientists to measure the physical characteristics of diseased cells involved in cancer, Alzheimer’s, and autoimmune diseases. Through the use of high-frequency ultrasonic waves, such an advancement will allow scientists to test the normal healthy range of virtually any cell type for density and stiffness, providing new capabilities for analyzing healthy cell development as well as insight into the changes that occur as diseases develop and the cells’ characteristics begin to change.

Prior research methods of probing cells with ultrasound have relied upon growing the cells on the bottom of a Petri dish, which distorts not only the cells’ shape and structure, butlso interfere with the ultrasonic signals. A new method was therefore needed to probe the cells without disturbing their natural form, and to “clean up” the signals received by the ultrasound device. Research presented at the 2015 ASA meeting in Jacksonville Florida will show that the use of acoustic levitation is effective in providing the ideal conditions for probing the cells.

Acoustic levitation is a phenomenon whereby pressure differences of stationary sound waves can be used to suspend small objects in gases or fluids such as air or water. We are currently exploring a new frontier in acoustic levitation of cellular structures in a fluid medium by perfecting a method by which we can manipulate the shape and frequency of sound waves inside of special containers. By manipulating these sound waves in just the right fashion it is possible to isolate a layer of cells in a fluid such as water, which can then be probed with an ultrasound device. The cells are then in a more natural form and environment, and the interference from the floor of the Petri dish is no longer a hindrance.

This method has proven effective in the laboratory with buoyancy neutral beads that are roughly the same size and shape as human blood cells, and a study is currently underway to test the effectiveness of this method with biological samples. If effective, this will give researchers new experimental methods by which to study cellular processes, thus leading to a better understanding of the development of certain diseases in the human body.

2pSCb11 – Effect of Menstrual Cycle Hormone Variations on Dichotic Listening Results

Richard Morris – Richard.morris@cci.fsu.edu
Alissa Smith

Florida State University
Tallahassee, Florida

Popular version of poster presentation 2pSCb11, “Effect of menstrual phase on dichotic listening”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 8

How speech is processed by the brain has long been of interest to researchers and clinicians. One method to evaluate how the two sides of the brain work when hearing speech is called a dichotic listening task. In a dichotic listening task two words are presented simultaneously to a participant’s left and right ears via headphones. One word is presented to the left ear and a different one to the right ear. These words are spoken at the same pitch and loudness levels. The listener then indicates what word was heard. If the listener regularly reports hearing the words presented to one ear, then there is an ear advantage. Since most language processing occurs in the left hemisphere of the brain, most listeners attend more closely to the right ear. The regular selection of the word presented to the right ear is termed a right ear advantage (REA).

Previous researchers reported different responses from males and females to dichotic presentation of words. Those investigators found that males more consistently heard the word presented to the right ear and demonstrated a stronger REA. The female listeners in those studies exhibited more variability as to the ear of the word that was heard. Further research seemed to indicate that women exhibit different lateralization of speech processing at different phases of their menstrual cycle. In addition, data from recent studies indicate that the degree to which women can focus on the input to one ear or the other varies with their menstrual cycle.

However, the previous studies used a small number of participants. The purpose of the present study was to complete a dichotic listening study with a larger sample of female participants. In addition, the previous studies focused on women who did not take oral contraceptives as they were assumed to have smaller shifts in the lateralization of speech processing. Although this hypothesis is reasonable, it needs to be tested. For this study, it was hypothesized that the women would exhibit a greater REA during the days that they menstruate than during other days of their menstrual cycle. This hypothesis was based on the previous research reports. In addition, it was hypothesized that the women taking oral contraceptives will exhibit smaller fluctuations in the lateralization of their speech processing.

Participants in the study were 64 females, 19-25 years of age. Among the women 41 were taking oral contraceptives (OC) and 23 were not. The participants listened to the sound files during nine sessions that occurred once per week. All of the women were in good general health and had no speech, language, or hearing deficits.

The dichotic listening task was executed using the Alvin software package for speech perception research. The sound file consisted of consonant-vowel syllables comprised of the six plosive consonants /b/, /d/, /g/, /p/, /t/, and /k/ paired with the vowel “ah”. The listeners heard the syllables over stereo headphones. Each listener set the loudness of the syllables to a comfortable level.

At the beginning of the listening session, each participant wrote down the date of the initiation of her most recent menstrual period on a participant sheet identified by her participant number. Then, they heard the recorded syllables and indicated the consonant heard by striking that key on the computer keyboard. Each listening session consisted of three presentations of the syllables. There were different randomizations of the syllables for each presentation. In the first presentation, the stimuli will be presented in a non-forced condition. In this condition the listener indicted the plosive that she heard most clearly. After the first presentation, the experimental files were presented in a manner referred to as a forced left or right condition. In these two conditions the participant was directed to focus on the signal in the left or right ear. The sequence of focus on signal to the left ear or to the right ear was counterbalanced over the sessions.

The statistical analyses of the listeners’ responses revealed that no significant differences occurred between the women using oral contraceptives and those who did not. In addition, correlations between the day of the women’s menstrual cycle and their responses were consistently low. However, some patterns did emerge for the women’s responses across the experimental sessions as opposed to the days of their menstrual cycle. The participants in both groups exhibited a higher REA and lower percentage of errors for the final sessions in comparison to earlier sessions.

The results from the current subjects differ from those previously reported. Possibly the larger sample size of the current study, the additional month of data collection, or the data recording method affected the results. The larger sample size might have better represented how most women respond to dichotic listening tasks. The additional month of data collection may have allowed the women to learn how to respond to the task and then respond in a more consistent manner. The short data collection period may have confused the learning to respond to a novel task with a hormonally dependent response. Finally, previous studies had the experimenter record the subjects’ responses. That method of data recording may have added bias to the data collection. Further studies with large data sets and multiple months of data collection are needed to determine any sex and oral contraceptive use effects on REA.

2aAA9 – Quietly Staying Fit in the Multifamily Building

Paulette Nehemias Harvey – pendeavors@gmail.com
Kody Snow – ksnow@phoenixnv.com
Scott Harvey – sharvey@phoenixnv.com

Phoenix Noise & Vibration
5216 Chairmans Court, Suite 107
Frederick, Maryland 21703

Popular version of paper 2aAA9, “Challenges facing fitness center designers in multifamily buildings
Presented Tuesday morning, November 3, 2015, 11:00 AM, Grand Ballroom 3
Session 2aAA, Acoustics of Multifamily Dwellings
170th ASA Meeting, Jacksonville

Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.

Harvey 1 Treadmill - fitness centerTransit centered living relies on amenities close to home; mixing multifamily residential units with professional, retail and commercial units on the same site. Use the nearby trains to get to work and out, but rely on the immediate neighborhood, even the lobby for errands and everyday needs. Transit centered living is appealing as it eliminates the need for sitting in traffic, seems good for the environment and adds a sense of security, aerobic health and time-saving convenience. Include an on-site fitness center and residents don’t even have to wear street clothes to get to their gym!

Developers know that a state-of-the-art fitness center is icing on their multifamily residence cake as far as attracting buyers. Gone is the little interior room with a couple treadmills and a stationary bike. Today’s designs include panoramic views, and enough ellipticals, free weights, weight & strength machines, and shower rooms to fill 2500-4000 square feet, not to mention the large classes offered with high energy music and an enthusiastic leader with a microphone. The increased focus on maintaining aerobic health, strength and mobility is fantastic, but the noise and vibration it generates? Not so great. Sometimes cooperative scheduling keeps the peace, but often residents will want to have access to their fitness center at all hours, so wise project leaders involve a noise control engineer early in the design process to develop a fitness center next to which everyone will want to live.

Remember the string and two empty cans? Stretch the string taut and conversations can travel the length of the string, but pinch the string and the system fails. As noise travels through all kinds of structures and through the air as well, it is the design goal of the noise and vibration experts to prevent that transmission. Airborne noise control can be effective using a combination of layered gypsum board, fiberglass batt insulation, concrete and resilient framing members that absorb the sound rather than transmit it through a wall or a floor/ceiling system. Controlling the structure borne noise and vibration can involve much thicker rubber mats, isolated concrete slabs and a design that incorporates the structural engineer’s input on stiffening the base building structure. And it’s not simply noise that the design is intended to restrict, it is silent, but annoying vibrations as well.

Harvey 2 Kettleball exercise - fitness center

Reducing the floor shaking impact of dropping barbells on the ground is the opposite of hearing a pin drop. Heavy, steel plates loaded on a barbell, lifted 6-8 feet off the ground and then dropped. Repeatedly. Nobody wants to live under that, so designers think location, location, location. But big windows are pointless in the basement, so something has to go under the fitness center. Garage space, storage units or mechanical rooms won’t mind the calisthenics above them. And sometimes the overall design of the building structure, whether it be high-rise with underground parking, Texas wrap building (u-shaped building with elevated parking garage on interior.), or a podium style building can offer an ideal location for this healthy necessity.

It’s not an acoustical trade secret that the best method of noise control is at the source so consider what makes the noise. Manufacturers have met the demand for replacing the old standard graduated barbell steel plates for free weight combinations with a rubber/urethane coated steel weight. These weights make much less noise when impacting each other, but are still capable of generating excessive structure-borne noise levels. This is a great example of controlling both air borne (plates clanking together) and structure borne (barbells impacting the floor) transmission paths. Speakers and sound systems and the wall/floor/ceiling systems can work together to offer clarity and quality to listeners and limitations for what the neighbors will hear, but it takes expertise and attention.

Disregarding the recommendations of noise and vibration professionals can result in an annoying, on-site gym that brings stressful tension and ongoing conflict, nothing that promotes healthy well-being.

Foresight in design and attention to acoustical specs on building materials, under the direction of a noise and vibration engineer, assures a fitness center that is a pleasant, effective space for fitness and social opportunities, an asset to the transit centered neighborhood. Do everyone a favor and pay attention to good design and product specification early on; that’s sound advice.