Using Automatic Speech Recognition to Identify Dementia in Early Stages
Roozbeh Sadeghian, J. David Schaffer, and Stephen A. Zahorian
SUNY at Binghamton
Popular version of paper 2aSP5, “Using automatic speech recognition to identify dementia in early stages”
Presented Tuesday morning, November 3, 2015, 10:15 AM, City Terrace room
170th ASA Meeting, Jacksonville, Fl
The clinical diagnosis of Alzheimer’s disease (AD) and other dementias is very challenging, especially in the early stages. It is widely believed to be underdiagnosed, at least partially because of the lack of a reliable non-invasive diagnostic test. Additionally, recruitment for clinical trials of experimental dementia therapies might be improved with a highly specific test. Although there is much active research into new biomarkers for AD, most of these methods are expensive and or invasive such as brain imaging, often with radioactive tracers, or taking blood or spinal fluid samples and expensive lab procedures.
There are good indications that dementias can be characterized by several aphasias (defects in the use of speech). This seems plausible since speech production involves many brain regions, and thus a disease that effects particular regions involved in speech processing might leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics (analysis of word patterns) have progressed to the point where an automatic speech analysis system could be within reach as a tool for detection of dementia. The long-term goal is an inexpensive, short duration, non-invasive test for dementia; one that can be administered in an office or home by clinicians with minimal training.
If a pilot study (cross sectional design: only one sample from each subject) indicates that suitable combinations of features derived from a voice sample can strongly indicate disease, then the research will move to a longitudinal design (many samples collected over time) where sizable cohorts will be followed so that early indicators might be discovered.
A simple procedure for acquiring speech samples is to ask subjects to describe a picture (see Figure 1). Some such samples are available on the web (DementiaBank), but they were collected long ago and the audio quality is often lacking in quality. We used 140 of these older samples, but also collected 71 new samples with good quality audio. Roughly half of the samples had a clinical diagnosis of probable AD, and the others were demographically similar and cognitively normal (NL).
One hundred twenty eight features were automatically extracted from speech signals, including pauses and pitch variation (indicating emotion); word-use features were extracted from manually-prepared transcripts. In addition, we had the results of a popular cognitive test, the mini mental state exam (MMSE) for all subjects. While widely used as an indicator of cognitive difficulties, the MMSE is not sufficiently diagnostic for dementia by itself. We searched for patterns with and without the MMSE. This gives the possibility of a clinical test that combines speech with the MMSE. Multiple patterns were found using an advanced pattern discovery approach (genetic algorithms with support vector machines). The performances of two example patterns are shown in Figure 2. The training samples (red circles) were used to discover the patterns, so we expect them to perform well. The validation samples (blue) were not used for learning, only to test the discovered patterns. If we say that a subject will be declared AD if the test score is > 0.5 (the red line in Figure 2), we can see some errors: in the left panel we see one false positive (NL case with a high test score, blue triangle) and several false negatives (AD cases with low scores, red circles).
Figure 1- The picture used for recording samples (a) famous cookie theft samples and (b) newly recorded samples
Figure 2. Two discovered diagnostic patterns (left with MMSE) (right without MMSE). The normal subjects are to the left in each plot (low scores) and the AD subjects to the right (high scores). No perfect pattern has yet been discovered.
As mentioned above, manually prepared transcripts were used for these results, since automatic speaker-independent speech recognition is very challenging for small highly variable data sets. To be viable, the test should be completely automatic. Accordingly, the main emphasis of the research presented at this conference is the design of an automatic speech-to-text system and automatic pause recognizer, taking into account the special features of the type of speech used for this test of dementia.