Forecasting mental distress using healthcare claims data

Taylor, Sara Ann.

Author(s)

Taylor, Sara Ann.

Download1193026390-MIT.pdf (9.126Mb)

Other Contributors

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Advisor

Rosalind W. Picard.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Recently, depression rates have reached record levels in the US: 7.1% of adults in the US had at least one major depressive episode in 2015 and an estimated 7 million American adults aged 65 and older experience depression. Anxiety disorders are also on the rise, with a recent review estimating a prevalence of up to 25% for the general population. This dissertation focuses on estimating and forecasting mental distress using data from electronic health records and insurance claims to try to answer a fundamental question: Can we predict who will need mental health help before they need it? If these individuals can be identified, we can develop ways to quickly mobilize resources to respond to any increase in symptoms and develop the methods to mitigate the effects of mental distress through ongoing baseline treatments.

Following a brief high-level review of the US healthcare system and its data sources, we use various standardized survey scores stored in Electronic Health Records (EHRs) to define how mental distress is categorized in the more ubiquitous claims data. We achieve a Matthew's correlation coefficient of 0.29 and an accuracy of 75% on a hold-out test set. These definitions are then used throughout the rest of the dissertation as the label of interest. We also describe a state-space based generalized linear model that can be used to estimate the rate of health care events. We found that only a 16-day history was needed for the state-space models compared to an 85-day history in a static model to achieve similar accuracies. Finally, we forecast distress using demographic information and healthcare event rate features. We report Matthew's correlation coefficients, accuracy, and other metrics for predicting 1, 3, 6, and 12 months in the future.

On a hold-out test set, we achieved accuracies of 89%, 74%, 59%, and 47% for forecasting the presence of a distress event 1, 3, 6, and 12 months into the future, respectively (compared to a baseline static model with accuracies of 78%, 63%, 49%, and 34%). We found that including the current distress label significantly improved the forecast results of the next period.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 169-177).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127498

Department

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Publisher

Massachusetts Institute of Technology

Keywords

Program in Media Arts and Sciences

Collections

Doctoral Theses