2aSP5 – Using Automatic Speech Recognition to Identify Dementia in Early Stages

Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
Rsadegh1@binghamton.edu
SUNY at Binghamton
Binghamton, NY

Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl

The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test.  Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.

There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.

If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.

A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).

(a) (b)Sadeghian Figure1b

Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples

One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).  

Sadeghian 2_graphs - Dementia

Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered. 

As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets.  To be viable, the test should be completely automatic.  Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.

2pSCb11 – Effect of Menstrual Cycle Hormone Variations on Dichotic Listening Results

Richard Morris – Richard.morris@cci.fsu.edu
Alissa Smith

Florida State University
Tallahassee, Florida

Popular version of poster presentation 2pSCb11, “Effect of menstrual phase on dichotic listening”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 8

How speech is processed by the brain has long been of interest to researchers and clinicians. One method to evaluate how the two sides of the brain work when hearing speech is called a dichotic listening task. In a dichotic listening task two words are presented simultaneously to a participant’s left and right ears via headphones. One word is presented to the left ear and a different one to the right ear. These words are spoken at the same pitch and loudness levels. The listener then indicates what word was heard. If the listener regularly reports hearing the words presented to one ear, then there is an ear advantage. Since most language processing occurs in the left hemisphere of the brain, most listeners attend more closely to the right ear. The regular selection of the word presented to the right ear is termed a right ear advantage (REA).

Previous researchers reported different responses from males and females to dichotic presentation of words. Those investigators found that males more consistently heard the word presented to the right ear and demonstrated a stronger REA. The female listeners in those studies exhibited more variability as to the ear of the word that was heard. Further research seemed to indicate that women exhibit different lateralization of speech processing at different phases of their menstrual cycle. In addition, data from recent studies indicate that the degree to which women can focus on the input to one ear or the other varies with their menstrual cycle.

However, the previous studies used a small number of participants. The purpose of the present study was to complete a dichotic listening study with a larger sample of female participants. In addition, the previous studies focused on women who did not take oral contraceptives as they were assumed to have smaller shifts in the lateralization of speech processing. Although this hypothesis is reasonable, it needs to be tested. For this study, it was hypothesized that the women would exhibit a greater REA during the days that they menstruate than during other days of their menstrual cycle. This hypothesis was based on the previous research reports. In addition, it was hypothesized that the women taking oral contraceptives will exhibit smaller fluctuations in the lateralization of their speech processing.

Participants in the study were 64 females, 19-25 years of age. Among the women 41 were taking oral contraceptives (OC) and 23 were not. The participants listened to the sound files during nine sessions that occurred once per week. All of the women were in good general health and had no speech, language, or hearing deficits.

The dichotic listening task was executed using the Alvin software package for speech perception research. The sound file consisted of consonant-vowel syllables comprised of the six plosive consonants /b/, /d/, /g/, /p/, /t/, and /k/ paired with the vowel “ah”. The listeners heard the syllables over stereo headphones. Each listener set the loudness of the syllables to a comfortable level.

At the beginning of the listening session, each participant wrote down the date of the initiation of her most recent menstrual period on a participant sheet identified by her participant number. Then, they heard the recorded syllables and indicated the consonant heard by striking that key on the computer keyboard. Each listening session consisted of three presentations of the syllables. There were different randomizations of the syllables for each presentation. In the first presentation, the stimuli will be presented in a non-forced condition. In this condition the listener indicted the plosive that she heard most clearly. After the first presentation, the experimental files were presented in a manner referred to as a forced left or right condition. In these two conditions the participant was directed to focus on the signal in the left or right ear. The sequence of focus on signal to the left ear or to the right ear was counterbalanced over the sessions.

The statistical analyses of the listeners’ responses revealed that no significant differences occurred between the women using oral contraceptives and those who did not. In addition, correlations between the day of the women’s menstrual cycle and their responses were consistently low. However, some patterns did emerge for the women’s responses across the experimental sessions as opposed to the days of their menstrual cycle. The participants in both groups exhibited a higher REA and lower percentage of errors for the final sessions in comparison to earlier sessions.

The results from the current subjects differ from those previously reported. Possibly the larger sample size of the current study, the additional month of data collection, or the data recording method affected the results. The larger sample size might have better represented how most women respond to dichotic listening tasks. The additional month of data collection may have allowed the women to learn how to respond to the task and then respond in a more consistent manner. The short data collection period may have confused the learning to respond to a novel task with a hormonally dependent response. Finally, previous studies had the experimenter record the subjects’ responses. That method of data recording may have added bias to the data collection. Further studies with large data sets and multiple months of data collection are needed to determine any sex and oral contraceptive use effects on REA.

2aAA9 – Quietly Staying Fit in the Multifamily Building

Paulette Nehemias Harvey – pendeavors@gmail.com
Kody Snow – ksnow@phoenixnv.com
Scott Harvey – sharvey@phoenixnv.com

Phoenix Noise & Vibration
5216 Chairmans Court, Suite 107
Frederick, Maryland 21703

Popular version of paper 2aAA9, “Challenges facing fitness center designers in multifamily buildings
Presented Tuesday morning, November 3, 2015, 11:00 AM, Grand Ballroom 3
Session 2aAA, Acoustics of Multifamily Dwellings
170th ASA Meeting, Jacksonville

Please keep in mind that the research described in this Lay Language Paper may not have yet been peer reviewed.

Harvey 1 Treadmill - fitness centerTransit centered living relies on amenities close to home; mixing multifamily residential units with professional, retail and commercial units on the same site. Use the nearby trains to get to work and out, but rely on the immediate neighborhood, even the lobby for errands and everyday needs. Transit centered living is appealing as it eliminates the need for sitting in traffic, seems good for the environment and adds a sense of security, aerobic health and time-saving convenience. Include an on-site fitness center and residents don’t even have to wear street clothes to get to their gym!

Developers know that a state-of-the-art fitness center is icing on their multifamily residence cake as far as attracting buyers. Gone is the little interior room with a couple treadmills and a stationary bike. Today’s designs include panoramic views, and enough ellipticals, free weights, weight & strength machines, and shower rooms to fill 2500-4000 square feet, not to mention the large classes offered with high energy music and an enthusiastic leader with a microphone. The increased focus on maintaining aerobic health, strength and mobility is fantastic, but the noise and vibration it generates? Not so great. Sometimes cooperative scheduling keeps the peace, but often residents will want to have access to their fitness center at all hours, so wise project leaders involve a noise control engineer early in the design process to develop a fitness center next to which everyone will want to live.

Remember the string and two empty cans? Stretch the string taut and conversations can travel the length of the string, but pinch the string and the system fails. As noise travels through all kinds of structures and through the air as well, it is the design goal of the noise and vibration experts to prevent that transmission. Airborne noise control can be effective using a combination of layered gypsum board, fiberglass batt insulation, concrete and resilient framing members that absorb the sound rather than transmit it through a wall or a floor/ceiling system. Controlling the structure borne noise and vibration can involve much thicker rubber mats, isolated concrete slabs and a design that incorporates the structural engineer’s input on stiffening the base building structure. And it’s not simply noise that the design is intended to restrict, it is silent, but annoying vibrations as well.

Harvey 2 Kettleball exercise - fitness center

Reducing the floor shaking impact of dropping barbells on the ground is the opposite of hearing a pin drop. Heavy, steel plates loaded on a barbell, lifted 6-8 feet off the ground and then dropped. Repeatedly. Nobody wants to live under that, so designers think location, location, location. But big windows are pointless in the basement, so something has to go under the fitness center. Garage space, storage units or mechanical rooms won’t mind the calisthenics above them. And sometimes the overall design of the building structure, whether it be high-rise with underground parking, Texas wrap building (u-shaped building with elevated parking garage on interior.), or a podium style building can offer an ideal location for this healthy necessity.

It’s not an acoustical trade secret that the best method of noise control is at the source so consider what makes the noise. Manufacturers have met the demand for replacing the old standard graduated barbell steel plates for free weight combinations with a rubber/urethane coated steel weight. These weights make much less noise when impacting each other, but are still capable of generating excessive structure-borne noise levels. This is a great example of controlling both air borne (plates clanking together) and structure borne (barbells impacting the floor) transmission paths. Speakers and sound systems and the wall/floor/ceiling systems can work together to offer clarity and quality to listeners and limitations for what the neighbors will hear, but it takes expertise and attention.

Disregarding the recommendations of noise and vibration professionals can result in an annoying, on-site gym that brings stressful tension and ongoing conflict, nothing that promotes healthy well-being.

Foresight in design and attention to acoustical specs on building materials, under the direction of a noise and vibration engineer, assures a fitness center that is a pleasant, effective space for fitness and social opportunities, an asset to the transit centered neighborhood. Do everyone a favor and pay attention to good design and product specification early on; that’s sound advice.

5aMU1 – The inner ear as a musical instrument

Brian Connolly – bconnolly1987@gmail.com
Music Department
Logic House
South Campus
Maynooth University
Co. Kildare
Ireland

Popular version of paper 5aMU1, “The inner ear as a musical instrument”
Presented Friday morning, November 6, 2015, 8:30 AM, Grand Ballroom 2
170th ASA meeting Jacksonville
See also: The inner ear as a musical instrument – POMA

(please use headphones for listening to all audio samples)

Did you know that your ears could sing? You may be surprised to hear that they, in fact, have the capacity to make particularly good performers and recent psychoacoustics research has revealed the true potential of the ears within musical creativity. ‘Psychoacoustics’ is loosely defined as the study of the perception of sound.

Figure 1: The Ear

inner ear

A good performer can carry out required tasks reliably and without errors. In many respects the very straight-forward nature of the ear’s responses to certain sounds results in the ear proving to be a very reliable performer as its behaviour can be predicted and so it is easily controlled. In the context of the listening system, the inner ear has the ability to behave as a highly effective instrument which can create its own sounds that many experimental musicians have been using to turn the listeners’ ears into participating performers in the realization of their music.

One of the most exciting avenues of musical creativity is the psychoacoustic phenomenon known as otoacoustic emissions. These are tones which are created within the inner ear when it is exposed to certain sounds. One such example of these emissions is ‘difference tones.’ When two clear frequencies enter the ear at, say 1,000Hz and 1,200Hz the listener will hear these two tones, as expected, but the inner ear will also create its own third frequency at 200Hz because this is the mathematical difference between the two original tones. The ear literally sends a 200Hz tone back out in reverse through the ear and this sound can be detected by an in-ear microphone, a process which doctors carrying out hearing tests on babies use as an integral part of their examinations. This means that composers can create certain tones within their work and predict that the listeners’ ears will also add their extra dimension to the music upon hearing it. Within certain loudness and frequency ranges, the listeners will also be able to feel their ears buzzing in response to these stimulus tones! This makes for a very exciting and new layer to contemporary music making and listening.

First listen to this tone. This is very close to the sound your ear will sing back during the second example.

Insert – 200.mp3

Here is the second sample containing just two tones at 1,000Hz and 1,200Hz. See if you can also hear the very low and buzzing difference tone which is not being sent into your ear, it is being created in your ear and sent back out towards your headphones!

Insert – 1000and1200.mp3

If you could hear the 200Hz difference tone in the previous example, have a listen to this much more complex demonstration which will make your ears sing a well known melody. It is important to try to not listen to the louder impulsive sounds and see if you can hear your ears humming along to perform the tune of Twinkle, Twinkle, Little Star at a much lower volume!

(NB: The difference tones will start after about 4 seconds of impulses)

Insert – Twinkle.mp3

Auditory beating is another phenomenon which has caught the interest of many contemporary composers. In the below example you will hear the following: 400Hz in your left ear and 405Hz in your right ear.

First play the below sample by placing the headphones into your ears just one at a time. Not together. You will hear two clear tones when you listen to them separately.

Insert – 400and405beating.mp3

Now try and see what happens when you place them into your ears simultaneously. You will be unable to hear these two tones together. Instead, you will hear a fused tone which beats five times per second. This is because each of your ears are sending electrical signals to the brain telling it what frequency it is responding to but these two frequencies are too close together and so a perceptual confusion occurs resulting in a combined frequency being perceived which beats at a rate which is the same as the mathematical difference between the two tones.

Auditory beating becomes particularly interesting in pieces of music written for surround sound environments when the proximity of the listener to the various speakers plays a key factor and so simply turning one’s head in these scenarios can often entirely change the colour of the sound as different layers of beating will alter the overall timbre of the sound.

So how can all of these be meaningful to composers and listeners alike? The examples shown here are intended to be basic and provide proofs of concept more so than anything else. In the much more complex world of music composition the scope for the employment of such material is seemingly endless. Considering the ear as a musical instrument gives the listener the opportunity to engage with sound and music in a more intimate way than ever before.

Brian Connolly’s compositions which explore such concepts in greater detail can be found at www.soundcloud.com/brianconnolly-1