Comparison of Natural Language Processing Models for Depression Detection in Chatbot Dialogues
Author(s)
Belser, Christian Alexander
DownloadThesis PDF (1.357Mb)
Advisor
Fletcher, Richard Ribon
Terms of use
Metadata
Show full item recordAbstract
Depression is an important challenge in the world today and a large source of disability. In the US, a recent study showed that approximately 36 million adults had at least one major depressive episode, including some with severe impairment [1]. However, approximately two-thirds of all depression cases are never diagnosed [2], largely due to a shortage of trained mental health professionals as well as a lingering cultural stigma that often prevents afflicted people from seeking professional care. In order to address this need, there is an emerging interest in using computer algorithms to automatically screen for depression, which offers the potential to be widely deployed to the public via clinical websites and mobile apps. Within this field, Dr. Fletcher’s group at MIT develops mobile platforms that are used to support mental health wellness and psychotherapy, including tools to screen for mental health disorders and refer people to treatment. As part of this work, this thesis compares three distinct Natural Language Processing (NLP) models used to screen for depression. I have revised and updated three state-of-the-art models: (1) Bi-directional gated recurrent unit (BGRU) models, (2) Hierarchical attention networks (HAN), and (3) Long-sequence Transformer models to accurately screen for depression in individuals. The models were all trained and tested on a common standard clinical dataset (DAICWoz) that is derived from clinical patient interviews. After optimization, and exploring several variants of each type of model, the following results were found: BGRU (accuracy=0.71, precision=0.65, recall=63, F1-score=0.64, MCC=0.20); HAN (accuracy= 0.77, precision=0.76, recall=0.77, F1-score=0.76, MCC=0.46); Transformer (accuracy=0.77, precision=0.76, recall=0.77, F1-score=0.76, MCC=0.43). In addition to model performance, I also compare the different categories of models based on computational resources and input token size. I also discuss the future evolution of these models and provide recommendations for specific use cases.
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology