4aEA2 – How soon can you use your new concrete driveway?

Jinying Zhu: jyzhu@unl.edu
Department of Civil Engineering
University of Nebraska-Lincoln
1110 S 67th St., Omaha, NE 68182, USA

Popular version of paper 4aEA2, “Monitoring hardening of concrete using ultrasonic guided waves” Presented Thursday morning, Nov. 5, 2015, 8:50 AM, ORLANDO room,
170th ASA Meeting, Jacksonville, FL

Concrete is the most commonly used construction material in the world. The performance of concrete structures is largely determined by properties of fresh concrete at early ages. Concrete gains strength through a chemical reaction between water and cement (hydration), which gradually change a fluid fresh concrete mix to a rigid and hard solid. The process is called setting and hardening.  It is important to measure the setting times, because you may not have enough time to mix and place concrete if the setting time is too early, while too late setting will cause delay in strength gain.  The setting and hardening process is affected by many parameters, including water and cement ratio, temperature, and chemical admixtures.  The standard method to test setting time is to measure penetration resistance of fresh concrete samples in laboratory, which may not represent the real condition in field.

Zhu1 - concrete

Figure. 1 Principle of ultrasonic guided wave test.

Ultrasonic waves have been proposed to monitor the setting and hardening process of concrete by measuring wave velocity change. When concrete becomes hard, the stiffness increases, and the ultrasonic velocity also increases. The authors found there is a clear relationship between the shear wave velocity and the traditional penetration resistance. However, most ultrasonic tests measure a small volume of concrete sample in laboratory, and they are not suitable for field application. In this paper, the authors proposed an ultrasonic guided wave test method. Steel reinforcements (rebars) are used in most concrete structures. When ultrasonic guided waves propagate within rebar, they leak energy to surrounding concrete, and the energy leakage rate is proportion to the stiffness of concrete.  Ultrasonic waves can be introduced into rebars from one end and the echo signal will be received at the same end using the same ultrasonic sensor.  This test method has a simple test setup, and is able to monitor the concrete hardening process continuously.

zhu2 - concrete Zhu3 - concrete
Figure. 2 Ultrasonic echo signals measured in an embedded rebar for concrete age of 2~6 hours. Figure. 3 Guided wave attenuation rate in a rebar embedded in different cement pastes.

Figure 2 shows guided wave echo signals measured on a 19mm diameter rebar embedded in concrete. It is clear that the signal amplitude decreases with the age of concrete (2 ~ 6 hours). The attenuation can be plotted vs. age for different cement/concrete mixes. Figure 3 shows the attenuation curves for 3 cement paste mixes. It is known that a cement mix with larger water cement ratio (w/c) will have slower strength gain, which agrees with the ultrasonic guided wave test, where the w/c=0.5 mix has lower attenuation rate.  When there is a void around the rebar, energy leakage will be less than the case without a void, which is also confirmed by the test result in Figure 3.

Summary: This study presents experimental results using ultrasonic guided waves to monitor concrete setting and hardening process. It shows the guided wave leakage attenuation is proportional to the stiffness change of fresh concrete. Therefore the leakage rate can be used to monitor the concrete strength gain at early ages. This study may have broader applications in other disciplines to measure mechanical property of material using guided wave.

2pAAa10 – Turn around when you’re talking to me!

Jennifer Whiting – jkwhiting@physics.byu.edu
Timothy Leishman, PhD – tim_leishman@physics.byu.edu
K.J. Bodon – joshuabodon@gmail.com

Brigham Young University
N283 Eyring Science Center
Provo, UT 84602

Popular version of paper 2pAAa10, “High-resolution measurements of speech directivity”
Presented Tuesday afternoon, November 3, 2015, 4:40 PM, Grand Ballroom 3
170th ASA Meeting, Jacksonville

Introduction
In general, most sources of sound do not radiate equally in all directions. The human voice is no exception to this rule. How strongly sound is radiated in a given direction at a specific frequency, or pitch, is called directivity. While many [references] have studied the directivity of speaking and singing voices, some important details are missing. The research reported in this presentation measured directivity of live speech at higher angular and frequency resolutions than have been previously measured, in an effort to capture the missing details.

Measurement methods
The approach uses a semicircular array of 37 microphones spaced with five-degree polar-angle increments, see Figure 1. A subject sits on a computer-controlled rotating chair with his or her mouth aligned at the axis of rotation and circular center of the microphone array. He or she repeats a series of phonetically-balanced sentences at each of 72 five-degree azimuthal-angle increments. This results in 2522 measurement points on a sphere around the subject.

[MISSING Figure 1. A subject and the measurement array]

Analysis
The measurements are based on audio recordings of the subject who tries to repeat the sentences with exactly the same timing and inflection at each rotation. To account for the inevitable differences in each repetition, a transfer function and the coherence between a reference microphone near the subject and a measurement microphone on the semicircular array is computed. The coherence is used to examine how good each measurement is. The transfer function for each measurement point makes up the directivity. To visualize the results, each measurement is plotted on a sphere, where the color and the radius of the sphere indicate how strongly sound is radiated in that direction for a given frequency. Animations of these spherical plots show how the directivity differs for each frequency.

[MISSING Figure 2. Balloon plot for male speech directivity at 500 and 1000 Hz.]
[MISSING Figure 3. Balloon plot for female speech directivity at 500 and 1000 Hz.]
[MISSING Animation 1. Male Speech Directivity, animated]
[MISSING Animation 2. Female Speech Directivity, animated]

Results and Conclusions
Some unique results are visible in the animations. Most importantly, as frequency increases, one can see that most of the sound is radiated in the forward direction. This is one reason for why it’s hard to hear someone talking in the front of a car when you’re sitting in the back, unless they turn around to talk to you. One can also see in the animations that as frequency increases, and most of the sound radiates forwards, there is poor coherence in the back area. This doesn’t necessarily indicate a poor measurement, just poor signal-to-noise ratio, since there is little sound energy in that direction. It’s also interesting to see that the polar angle of the strongest radiation also changes with frequency. At some frequencies the sound is radiated strongly downward and to the sides, but at other frequencies the stound is radiated strongly upwards and forwards. Male and female directivities are similar in shape, but at different frequencies, since the fundamental frequency of males and females is so different.

A more complete understanding of speech directivity has great benefits to several industries. For example, hearing aid companies can use speech directivity patterns to know where to aim microphones in the hearing aids to pick up the best sound for the hearing aid wearer having a conversation. Microphone placement in cell phones can be adjusted to get clearer signal from those talking into the cell phone. The theater and audio industries can use directivity patterns to assist in positioning actors on stage, or placing microphones near the speakers to record the most spectrally rich speech. The scientific community can develop more complete models for human speech based on these measurements. Further study on this subject will allow researchers to improve the measurement method and analysis techniques to more fully understand the results, and generalize them to all speech containing similar phonemes to those in these measurements.

4aAB2 – Seemingly simple songs: Black-capped chickadee song revisited

Allison H. Hahn – ahhahn@ualberta.ca
Christopher B. Sturdy – csturdy@ualberta.ca

University of Alberta
Edmonton, AB, Canada

Popular version of 4aAB2 – Seemingly simple songs: Black-capped chickadee song revisited
Presented Thursday morning, November 5, 8:55 AM, City Terrace Room
170th ASA Meeting, Jacksonville, Fl

Vocal communication is a mode of communication important to many animal species, including humans. Over the past 60 years, songbird vocal communication has been widely-studied, largely because the invention of the sound spectrograph allows researchers to visually represent vocalizations and make precise acoustic measurements. Black-capped chickadees (Poecile atricapillus; Figure 1) are one example of a songbird whose song has been well-studied. Black-capped chickadees produce a short (less than 2 seconds), whistled fee-bee song. Compared to the songs produced by many songbird species, which often contain numerous note types without a fixed order, black-capped chickadee song is relatively simple, containing two notes produced in the same order during each song rendition. Although the songs appear to be acoustically simple, they contain a rich variety of information about the singer including: dominance rank, geographic location, and individual identity [1,2,3].

Interestingly, while songbird song has been widely-examined, most of the focus (at least for North Temperate Zone species) has been on male-produced song, largely because it was thought that only males actually produced song. However, more recently, there has been mounting evidence that in many songbird species, both males and females produce song [4,5]. In the study of black-capped chickadees, the focus has also been on male-produced song. However, recently, we reported that female black-capped chickadees also produce fee-bee song. One possible reason that female song has not been extensively reported is that to human vision, male and female chickadees are visually identical, so females that are singing may be mistakenly identified as male. However, by identifying a bird’s sex (via DNA analysis) and recording both males and females, our work [6] has shown that female black-capped chickadees do produce fee-bee song. Additionally, these songs are overall acoustically similar to male song (songs of both sexes contain two whistled notes; see Figure 2), making vocal discrimination by humans difficult.

Our next objective was to determine if any acoustic features varied between male and female songs. Using bioacoustic techniques, we were able to demonstrate that there are acoustic differences in male and female song, with females producing songs that contain a greater frequency decrease in the first note compared to male songs (Figure 2). These results demonstrate that there are sufficient acoustic differences to allow birds to identify the sex of a signing individual even in the absence of visual cues. Because birds may live in densely wooded environments, in which visual, but not auditory, cues are often obscured, being able to identify the sex of a bird (and whether the singer is a potential mate or territory rival) would be an important ability.

Following our bioacoustic analysis, an important next step was to determine whether birds are able to distinguish between male and female songs. In order to examine this, we used a behavioral paradigm that is common in animal learning studies: operant conditioning. By using this task, we were able to demonstrate that birds can distinguish between male and female songs; however, the particular acoustic features birds use in order to discriminate between the sexes may depend on the sex of the bird that is listening to the song. Specifically, we found evidence that male subjects responded based on information in the song’s first note, while female subjects responded based on information in the song’s second note [7]. One possible reason for this difference in responding is that in the wild, males need to quickly respond to a rival male that is a territory intruder, while females may assess the entire song to gather as much information about the singing individual (for example, information regarding a potential mate’s quality). While the exact function of female song is unknown, our studies clearly indicate that female black-capped chickadees produce songs and the birds themselves can perceive differences between male and female songs.

Black-capped chickadee
Figure 1. An image of a black-capped chickadee.

Sturdy_Figure2
Figure 2. Spectrogram (x-axis: time; y-axis: frequency in kHz) on a male song (top) and female song (bottom).

Sound file 1. An example of a male fee-bee song.

Sound file 2. An example of a female fee-bee song.

References

  1. Hoeschele, M., Moscicki, M.K., Otter, K.A., van Oort, H., Fort, K.T., Farrell, T.M., Lee, H., Robson, S.W.J., & Sturdy, C.B. (2010). Dominance signalled in an acoustic ornament. Animal Behaviour, 79, 657–664.
  2. Hahn, A.H., Guillette, L.M., Hoeschele, M., Mennill, D.J., Otter, K.A., Grava, T., Ratcliffe, L.M., & Sturdy, C.B. (2013). Dominance and geographic information contained within black-capped chickadee (Poecile atricapillus) song. Behaviour, 150, 1601-1622.
  3. Christie, P.J., Mennill, D.J., & Ratcliffe, L.M. (2004). Chickadee song structure is individually distinctive over long broadcast distances. Behaviour 141, 101–124.
  4. Langmore, N.E. (1998). Functions of duet and solo songs of female birds. Trends in Ecology and Evolution, 13, 136–140.
  5. Riebel, K. (2003). The “mute” sex revisited: vocal production and perception learning in female songbirds. Advances in the Study of Behavior, 33, 49–86
  6. Hahn, A.H., Krysler, A., & Sturdy, C.B. (2013). Female song in black-capped chickadees (Poecile atricapillus): Acoustic song features that contain individual identity information and sex differences. Behavioural Processes, 98, 98-105.
  7. Hahn, A.H., Hoang, J., McMillan, N., Campbell, K., Congdon, J., & Sturdy, C.B. (2015). Biological salience influences performance and acoustic mechanisms for the discrimination of male and female songs. Animal Behaviour, 104, 213-228.

1pABb1 – Mice ultrasonic detection and localization in laboratory environment

Yegor Sinelnikov – yegor.sinelnikov@gmail.com
Alexander Sutin, Hady Salloum, Nikolay Sedunov, Alexander Sedunov
Stevens Institute of Technology
Hoboken, NJ 07030

Tom Zimmerman, Laurie Levine
DLAR Stony Brook University
Stony Brook, NY 11790

David Masters
Department of Homeland Security
Science and Technology Directorate
Washington, DC

Popular version of poster 1pABb1, “Mice ultrasonic detection and localization in laboratory environment”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 3
170th ASA Meeting, Jacksonville

A house mouse, mus musculus, historically shares the human environment without much permission. It lives in our homes, enjoys our husbandry, and passes through walls and administrative borders unnoticed and unaware of our wary attention. Over the thousands of years of coexistence, mice excelled in a carrot and stick approach. Likewise, an ordinary wild mouse brings both danger and cure to humans todays. A danger is in the form of rodent-borne diseases, amongst them plague epidemics, well remembered in European medieval history, continue to pose a threat to human health. A cure is in the form of lending themselves as research subjects for new therapeutic agents, an airily misapprehension of genomic similarities, small size, and short life span. Moreover, physiological similarity in inner ear construction, brain auditory responses and unexpected richness in vocal signaling attested to the tremendous interest to mice bioacoustics and emotion perception.

The goal of this work is to start addressing possible threats reportedly carried by invasive species crossing US borders unnoticed in multiple cargo containers. This study focuses on demonstrating the feasibility of acoustic detection of potential rodent intrusions.

Animals communicate with smell, touch, movement, visual signaling and sound. Mice came well versed in sensorial abilities to face the challenge of sharing habitat with humans. Mice gave up color vision, developed exceptional stereoscopic smell, and learned to be deceptively quiet in human auditory range, discretely shifting their social acoustic interaction to higher frequencies. They predominantly use ultrasonic frequencies above the human hearing range as a part of their day-to-day non aggressive social interaction. Intricate ultrasonic mice songs composed of multiple syllable sounds often constituting complex phrases separated by periods of silence are well known to researchers.

In this study, mice sounds were recorded in a laboratory environment at an animal facility at Stony Brook University Hospital. The mice were allowed to move freely, a major condition for their vocalization in ultrasonic range. Confined to cages, mice did not produce ultrasonic signals. Four different microphones with flat ultrasonic frequency response were positioned in various arrangements and distances from the subjects. The distances varied from a few centimeters to several meters. An exemplary setup is shown in Figure 1. Three microphones, sensitive in the frequency range between 20 kHz and 100 kHz, were connected to preamplifiers via digital converters to a computer equipped with dedicated sound recording software. The fourth calibrated microphone was used for measurements of absolute sound level produced by a mouse. The spectrograms were monitored by an operator in real time to detect the onset of mice communications and simplify line data processing.

Sinenikov fig 1

Figure 1. Setup of experiment showing the three microphones (a) on a table with unrestrained mouse (b), recording equipment preamplifiers and digitizers (c) and computer (d).

Listen to a single motif of mice ultrasonic vocalization and observe mouse movement here:

This sound fragment was down converted (slowed down) fifteen times to be audible. In reality, mice social songs are well above the human audible range and are very fast. The spectrograms of mice vocalization at distances of 1 m and 5 m are shown in Figure 2. Mice vocalization was detectable at 5 m and retained recognizable vocalization pattern. Farther distances were not tested due to the limitation of the room size.

The real time detection of mice vocalization required detection of the fast, noise insensitive and automated algorithm. An innovative approach was required. Recognizing that no animal communication comes close to become a language, the richness and diversity of mice ultrasonic vocalization prompted us to apply speech processing measures for their real time detection. A number of generic speech processing measures such temporal signal to noise ratio, cepstral distance, and likelihood ratio were tested for the detection of mice vocalization events in the presence of background noise.  These measures were calculated from acoustical measurements and compared with conventional techniques, such as bandpass filtering, spectral power, or continuous monitoring of signal frames for the presence of expected tones.

screenshot - Mice

Figure 2. Sonograms of short ultrasonic vocalization syllables produced by mice at 1 m (left) and 5 m (right) distances from microphones.  The color scale is in the decibels.

Although speech processing measures were invented to assess human speech intelligibility, we found them applicable for the acoustic mice detection within few meters. Leaving aside the question about mice vocalization intelligibly, we concluded that selected speech processing measures enabled us to detect events of mice vocalization better than other generic signal processing techniques.

As a secondary goal of this study, upon successful acoustic detection, the mice vocalization needed to be processed to determine animal location. It was of main interest for border patrol applications, where both acoustic detection and spatial localization are critical, and because mice movement has a behavioral specificity. To prove the localization feasibility, detected vocalization events from each microphone pair were processed to determine the time difference of arrival (TDOA). The analysis was limited to nearby locations by relatively short cabling system. Because the animals were moving freely on the surface of a laboratory table, roughly coplanar with microphones, the TDOA values were converted to the animal location using simple triangulation scheme. The process is illustrated schematically in Figure 3 for two selected microphones. Note that despite low signal to noise ratio for the microphone 2, the vocalization events were successfully detected. The cross correlograms, calculated in spectral domain with empirical normalization to suppress the effect of uncorrelated noise, yielded reliable TDOA. A simple check for the zero sum of TDOA was used as a consistency control. Calculated TDOA were converted into spatial locations, which were assessed for correctness, experimental and computational uncertainties and compared with available video recordings. Despite relatively high level of technogenic noise, the TDOA calculated locations agreed well with video recordings. The TDOA localization uncertainty was estimated on the order of the mouse size, roughly corresponding to several wavelengths at 50 kHz. A larger number of microphones is expected to improve detectability and enable more precise three dimensional localization.

Hence, mice ultrasonic socialization sounds are detectable by the application of speech processing techniques, their TDOA are identifiable by cross correlation and provide decent spatial localization of animals in agreement with video observations.

screenshot

Figure 3. The localization process. First, the detected vocalization events from two microphones (left) are paired and their cross correlogram is calculated (middle). The maxima, marked by asterisks, define a set of identified TDOA.  The process is repeated for every pair of microphones. Second, the triangulation is performed (right). The colored hyperbolas illustrate possible locations of animal on a laboratory table based on calculated TDOA. Hyperbolas intersection provides the location of animal. The numbered squares mark the location of microphones.

1The constructed recording system is particularly important for the detection of mice in containers at US ports of entry, where low frequency noises are high. This pilot study confirms the feasibility of using Stevens Institute’s ultrasonic recording system for simultaneous detection of mice vocalization and movement.

This work was funded by the U.S. Department of Homeland Security’s Science and Technology Directorate. The views and conclusions contained in this paper are those of the authors and should not necessarily be interpreted as representing the official policies, either expressed or implied of the U.S. Department of Homeland Security.

2aSCb3 – How would you sketch a sound with your hands?

Hugo Scurto – Hugo.Scurto@ircam.fr
Guillaume Lemaitre – Guillaume.Lemaitre@ircam.fr
Jules Françoise – Jules.Francoise@ircam.fr
Patrick Susini – Patrick.Susini@ircam.fr
Frédéric Bevilacqua – Frederic.Bevilacqua@ircam.fr
Ircam
1 place Igor Stravinsky
75004 Paris, France

Popular version of paper 2aSCb3, “Combining gestures and vocalizations to imitate sounds”
Presented Tuesday morning, November 3, 2015, 10:30 AM in Grand Ballroom 8
170th ASA Meeting, Jacksonville

Scurto fig 1 - gestures

Figure 1. A person hears the sound of door squeaking and imitates it with vocalizations and gestures. Can the other person understand what he means?

Have you ever listened to an old Car Talk show? Here is what it sounded like on NPR back in 2010:

“So, when you start it up, what kind of noises does it make?
– It just rattles around for about a minute. […]
– Just like budublu-budublu-budublu?
– Yeah! It’s definitely bouncing off something, and then it stops”

As the example illustrates, it is often very complicated to describe a sound with words. But it is really easy to make it with our built-in sound-making system: the voice! In fact, we have observed earlier that this is exactly what people do: when we ask a person to communicate a sound to another person, she will very quickly try to recreate this noise with her voice – and also use a lot of gestures.

And this works! Communicating sounds with voice and gesture is much more effective than describing them with words and sentences. Imitations of sounds are fun, expressive, spontaneous, widespread in human communication, and very effective. These non-linguistic vocal utterances have been little studied, but nevertheless have the potential to provide researchers with new insights into several important questions in domains such as articulatory phonetics and auditory cognition.

The study we are presenting at this ASA meeting is part of a larger European project on how people imitate sounds with voice and gestures: SkAT-VG (“Sketching Audio Technologies with Voice and Gestures”, http://www.skatvg.eu): How do people produce vocal imitations (phonetics)? What are imitations made of (acoustics and gesture analysis)? How do other people interpret them (psychology)? The ultimate goal is to create “sketching” tools for sound designers (the persons that create the sounds of everyday products). If you are an architect and want to sketch a house, you can simply draw it on a sketchpad. But what do you do if you are a sound designer and want to rapidly sketch the sound of a new motorbike? Well, all that is available today are cumbersome pieces of software. Instead, the Skat-VG project aims to offer sound designers new tools that are as intuitive as a sketching pad: simply use their voice and gestures to control complex sound design tools. Therefore, the SkAT-VG project also conducts research in machine learning, sound synthesis, and studies how sound designers work.

Here at the ASA meeting, we are presenting a partial study in which we asked the question: “What do people use gestures for when they imitate a sound?” In fact, people use a lot of gestures, but we do not know what information these gestures convey: Are they redundant with the voice? Do they convey specific pieces of information that the voice cannot represent?

We first collected a huge database of vocal and gestural imitations. Then, we asked 50 participants to come to our lab and make vocal and gestural imitations for several hours. We recorded their voice, filmed them with a high-speed camera, and used a depth camera and accelerometers to measure their gestures. This resulted in a database of about 8000 imitations! This database is an unprecedented amount of material that now allows

We first analyzed the database qualitatively, by watching and annotating the videos. From this analysis, several hypotheses about the combination of gestures and vocalizations were drawn. Then, to test these hypotheses, we asked 20 participants to imitate 25 specially synthesized sounds with their voice and gestures.

The results showed a quantitative advantage of voice over gesture for communicating rhythmic information. Voice can reproduce accurately higher tempos than gestures, and is more precise than gestures when reproducing complex rhythmic patterns. We also found that people often use gestures in a metaphorical way, whereas voice reproduces some acoustic features of the sound. For instance, people shake their hands very rapidly whenever a sound is stable and noisy. This type of gesture does not really follow a feature of the sound: it simply means that the sound is noisy.

Overall, our study reveals the metaphorical function of gestures during sound imitation. Rather than following an acoustic characteristic, gestures expressively emphasize the vocalization and signal the most salient features. These results will inform the specifications of the SkAT-VG tools and make the tools more intuitive.

2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
Rsadegh1@binghamton.edu
SUNY at Binghamton
Binghamton, NY

Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

(a) (b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).  

Sadeghian 2_graphs - Dementia

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.