Show simple item record

dc.contributor.authorXu, Xuhai
dc.contributor.authorYao, Bingsheng
dc.contributor.authorDong, Yuanzhe
dc.contributor.authorGabriel, Saadia
dc.contributor.authorYu, Hong
dc.contributor.authorHendler, James
dc.contributor.authorGhassemi, Marzyeh
dc.contributor.authorDey, Anind K.
dc.contributor.authorWang, Dakuo
dc.date.accessioned2024-04-04T17:21:36Z
dc.date.available2024-04-04T17:21:36Z
dc.date.issued2024-03-06
dc.identifier.issn2474-9567
dc.identifier.urihttps://hdl.handle.net/1721.1/154068
dc.description.abstractAdvances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9\% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8\%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.en_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionof10.1145/3643540en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceACMen_US
dc.subjectComputer Networks and Communicationsen_US
dc.subjectHardware and Architectureen_US
dc.subjectHuman-Computer Interactionen_US
dc.titleMental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Dataen_US
dc.typeArticleen_US
dc.identifier.citationXuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, and Dakuo Wang. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 1, Article 31 (March 2024), 32 pages.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologiesen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-04-01T07:49:42Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-04-01T07:49:42Z
mit.journal.volume8en_US
mit.journal.issue1en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record