Detecting cognitive impairment from spoken language
Author(s)
Alhanai, Tuka(Tuka Waddah Talib Ali Al Hanai)
Download1124075112-MIT.pdf (39.95Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
James R. Glass.
Terms of use
Metadata
Show full item recordAbstract
Dementia comes second only to spinal cord injuries in terms of its debilitating effects; from memory-loss to physical disability. The standard approach to evaluate cognitive conditions are neuropsychological exams, which are conducted via in-person interviews to measure memory, thinking, language, and motor skills. Work is on-going to determine biomarkers of cognitive impairment, yet one modality that has been relatively less explored is speech. Speech has the advantage of being easy to record, and contains the majority of information transmitted during neuropsychological exams. To determine the viability of speech-based biomarkers, we utilize data from the Framingham Heart Study, that contains hour-long audio recordings of neuropsychological exams for over 5,000 individuals. The data is representative of a population and the real-world prevalence of cognitive conditions (3-4%). We first explore modeling cognitive impairment from a relatively small set of 92 subjects with complete information on audio, transcripts, and speaker turns. We loosen these constraints by modeling with only a fraction of audio (~2-3 minutes), of which the speaker segments are defined through text-based diarization. We next apply this diarization method to extract audio features from all 7,000+ recordings (most of which have no transcripts), to model cognitive impairment (AUC 0.83, spec. 78%, sens. 79%). Finally, we eliminate the need for feature-engineering by training a neural network to learn higher-order representations from filterbank features (AUC 0.85, spec. 81%, sens. 82%). Our speech models exhibit strong performance and are comparable to the baseline demographic model (AUC 0.85, spec. 93%, sens. 65%). Further analysis shows that our neural network model automatically learns to detect specific speech activity which clusters according to: pause followed by onset of speech, short burst of speech, speech activity in high-frequency spectral energy bands, and silence.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 141-165).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.