Using Automatic Speech Recognition to Identify Dementia in Early Stages
Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian Rsadegh1@binghamton.edu SUNY at Binghamton
Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl
The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test. Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.
There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.
If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.
A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).
One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).
Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples
Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered.
As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets. To be viable, the test should be completely automatic. Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.
Popular version of poster presentation 2pSCb11, “Effect of menstrual phase on dichotic listening”
Presented Tuesday afternoon, November 3, 2015, 3:30 PM, Grand Ballroom 8
How speech is processed by the brain has long been of interest to researchers and clinicians. One method to evaluate how the two sides of the brain work when hearing speech is called a dichotic listening task. In a dichotic listening task two words are presented simultaneously to a participant’s left and right ears via headphones. One word is presented to the left ear and a different one to the right ear. These words are spoken at the same pitch and loudness levels. The listener then indicates what word was heard. If the listener regularly reports hearing the words presented to one ear, then there is an ear advantage. Since most language processing occurs in the left hemisphere of the brain, most listeners attend more closely to the right ear. The regular selection of the word presented to the right ear is termed a right ear advantage (REA).
Previous researchers reported different responses from males and females to dichotic presentation of words. Those investigators found that males more consistently heard the word presented to the right ear and demonstrated a stronger REA. The female listeners in those studies exhibited more variability as to the ear of the word that was heard. Further research seemed to indicate that women exhibit different lateralization of speech processing at different phases of their menstrual cycle. In addition, data from recent studies indicate that the degree to which women can focus on the input to one ear or the other varies with their menstrual cycle.
However, the previous studies used a small number of participants. The purpose of the present study was to complete a dichotic listening study with a larger sample of female participants. In addition, the previous studies focused on women who did not take oral contraceptives as they were assumed to have smaller shifts in the lateralization of speech processing. Although this hypothesis is reasonable, it needs to be tested. For this study, it was hypothesized that the women would exhibit a greater REA during the days that they menstruate than during other days of their menstrual cycle. This hypothesis was based on the previous research reports. In addition, it was hypothesized that the women taking oral contraceptives will exhibit smaller fluctuations in the lateralization of their speech processing.
Participants in the study were 64 females, 19-25 years of age. Among the women 41 were taking oral contraceptives (OC) and 23 were not. The participants listened to the sound files during nine sessions that occurred once per week. All of the women were in good general health and had no speech, language, or hearing deficits.
The dichotic listening task was executed using the Alvin software package for speech perception research. The sound file consisted of consonant-vowel syllables comprised of the six plosive consonants /b/, /d/, /g/, /p/, /t/, and /k/ paired with the vowel “ah”. The listeners heard the syllables over stereo headphones. Each listener set the loudness of the syllables to a comfortable level.
At the beginning of the listening session, each participant wrote down the date of the initiation of her most recent menstrual period on a participant sheet identified by her participant number. Then, they heard the recorded syllables and indicated the consonant heard by striking that key on the computer keyboard. Each listening session consisted of three presentations of the syllables. There were different randomizations of the syllables for each presentation. In the first presentation, the stimuli will be presented in a non-forced condition. In this condition the listener indicted the plosive that she heard most clearly. After the first presentation, the experimental files were presented in a manner referred to as a forced left or right condition. In these two conditions the participant was directed to focus on the signal in the left or right ear. The sequence of focus on signal to the left ear or to the right ear was counterbalanced over the sessions.
The statistical analyses of the listeners’ responses revealed that no significant differences occurred between the women using oral contraceptives and those who did not. In addition, correlations between the day of the women’s menstrual cycle and their responses were consistently low. However, some patterns did emerge for the women’s responses across the experimental sessions as opposed to the days of their menstrual cycle. The participants in both groups exhibited a higher REA and lower percentage of errors for the final sessions in comparison to earlier sessions.
The results from the current subjects differ from those previously reported. Possibly the larger sample size of the current study, the additional month of data collection, or the data recording method affected the results. The larger sample size might have better represented how most women respond to dichotic listening tasks. The additional month of data collection may have allowed the women to learn how to respond to the task and then respond in a more consistent manner. The short data collection period may have confused the learning to respond to a novel task with a hormonally dependent response. Finally, previous studies had the experimenter record the subjects’ responses. That method of data recording may have added bias to the data collection. Further studies with large data sets and multiple months of data collection are needed to determine any sex and oral contraceptive use effects on REA.
Popular version of poster 1pAB6
Presented Monday morning, November 2, 2015, 3:25 PM – 3:45 PM, City Terrace 9
170th ASA Meeting, Jacksonville
More than one hundred years ago, US clinician James Spalding first described an interesting phenomenon when he observed tinnitus patients suffering from perceived phantom ringing . Many of his patients reported that a loud, long-lasting sound produced by violin or piano made their tinnitus disappear for about a minute after the sound was presented. Nearly 70 years later, the first scientific study was conducted to investigate how this phenomenon, termed residual inhibition, is able to provide tinnitus relief . Further research into this phenomenon has been conducted to understand the basic properties of this “inhibition of ringing” and to identify what sounds are most effective at producing the residual inhibition .
The research indicated that indeed, residual inhibition is an internal mechanism for temporary tinnitus suppression. However, at present, little is known about the neural mechanisms underlying residual inhibition. Increased knowledge about residual inhibition may not only shed light on the cause of tinnitus, but also may open an opportunity to develop an effective tinnitus treatment.
For the last four years we have studied a fascinating phenomenon of sound processing in neurons of the auditory system that may provide an explanation of what causes the residual inhibition in tinnitus patients. After presenting a sound to a normal hearing animal, we observed a phenomenon where firing activity of auditory neurons is suppressed [4, 5]. There are several striking similarities between this suppression in the normal auditory system and residual inhibition observed in tinnitus patients:
Relatively loud sounds trigger both the neuronal firing suppression and residual inhibition.
Both the suppression and residual inhibition last for the same amount of time after a sound, and increasing the duration of the sound makes both phenomena last longer.
Simple tones produce more robust suppression and residual inhibition compared to complex sounds or noises.
Multiple attempts to induce suppression or residual inhibition within a short timeframe make both much weaker.
These similarities make us believe that the normal sound-induced suppression of spontaneous firing is an underlying mechanism of residual inhibition.
The most unexpected outcome from our research is that the phenomenon of residual inhibition, which focuses on tinnitus patients, appears to be a natural feature of sound processing, because suppression was observed in both the normal hearing mice and in mice with tinnitus. If so, why is it that people with tinnitus experience residual inhibition whereas those without tinnitus do not?
It is well known that hyperactivity in auditory regions of the brain has been linked to tinnitus, meaning that in tinnitus, auditory neurons have elevated spontaneous firing rates . The brain then interprets this hyperactivity as phantom sound. Therefore, suppression of this increased activity by a loud sound should lead to elimination or suppression of tinnitus. Normal hearing people also have this suppression occurring after loud sounds. However spontaneous firing of their auditory neurons remains low enough that they never perceive the phantom ringing that tinnitus sufferers do. Thus, even though there is suppression of neuronal firing by loud sounds in normal hearing people, it is not perceived.
Most importantly, our research has helped us identify a group of drugs that can alter this suppression response , as well as the spontaneous firing of the auditory neurons responsible for tinnitus. These drugs will be further investigated in our future research to develop effective tinnitus treatments.
This research was supported by the research grant RO1 DC011330 from the National Institute on Deafness and Other Communication Disorders of the U.S. Public Health Service.
 Spalding J.A. (1903). Tinnitus, with a plea for its more accurate musical notation.
Archives of Otology, 32(4), 263-272.
 Feldmann H. (1971). Homolateral and contralateral masking of tinnitus by noise-bands and by pure tones. International Journal of Audiology, 10(3), 138-144.
 Roberts L.E. (2007). Residual inhibition. Progress in Brain Research, Tinnitus:
Pathophysiology and Treatment, Elsevier, 166, 487-495.