dc.contributor.author | Xu, Xuhai | |
dc.contributor.author | Yao, Bingsheng | |
dc.contributor.author | Dong, Yuanzhe | |
dc.contributor.author | Gabriel, Saadia | |
dc.contributor.author | Yu, Hong | |
dc.contributor.author | Hendler, James | |
dc.contributor.author | Ghassemi, Marzyeh | |
dc.contributor.author | Dey, Anind K. | |
dc.contributor.author | Wang, Dakuo | |
dc.date.accessioned | 2024-04-04T17:21:36Z | |
dc.date.available | 2024-04-04T17:21:36Z | |
dc.date.issued | 2024-03-06 | |
dc.identifier.issn | 2474-9567 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/154068 | |
dc.description.abstract | Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9\% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8\%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research. | en_US |
dc.publisher | Association for Computing Machinery | en_US |
dc.relation.isversionof | 10.1145/3643540 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | ACM | en_US |
dc.subject | Computer Networks and Communications | en_US |
dc.subject | Hardware and Architecture | en_US |
dc.subject | Human-Computer Interaction | en_US |
dc.title | Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, and Dakuo Wang. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 1, Article 31 (March 2024), 32 pages. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.relation.journal | Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | en_US |
dc.identifier.mitlicense | PUBLISHER_POLICY | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2024-04-01T07:49:42Z | |
dc.language.rfc3066 | en | |
dc.rights.holder | The author(s) | |
dspace.date.submission | 2024-04-01T07:49:42Z | |
mit.journal.volume | 8 | en_US |
mit.journal.issue | 1 | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |