Characterizing Variation in Healthcare across Time and Providers using Machine Learning
Author(s)
Ji, Christina X.
DownloadThesis PDF (4.382Mb)
Advisor
Sontag, David
Terms of use
Metadata
Show full item recordAbstract
Modeling healthcare decisions and their outcomes is a complex problem. In addition to being affected by patient characteristics, the prognosis can vary depending on when the patient is receiving care, and treatment decisions can vary depending on who makes the decisions. In this thesis, we consider two axes of variation in healthcare: over time and across providers. For both axes, we focus on identifying when variation exists, characterizing the patients who are affected by such variation, and addressing shifts due to this variation. The solutions we propose draw ideas from causality and dataset shift.
In the first part of this thesis, we address these three aspects for variation over time. First, we create an algorithm that can detect when a model is affected by change over time and identify sub-populations where the model is more affected. We use our algorithm to perform a large-scale study of temporal shifts in health insurance claims. We demonstrate changes over time are prevalent in healthcare and examine case studies to better understand the drivers of such changes. Next, we examine how to learn a model that can perform well on current data. As data from the current time period is limited, we consider several methods that can leverage sequences of historical data to learn a good image classification model for the final time step. We build a benchmark for evaluating these methods on sequences constructed from synthetic shifts and validate our conclusions on a real-world dataset.
In the second part of this thesis, we address similar questions for variation across providers. First, we create a statistical approach to test whether significant variation exists across providers. Our approach involves learning a model of treatment decisions with provider-specific random effects. We perform a case study on first-line type 2 diabetes treatment and find significant variation exists across providers. Then, we develop an algorithm for identifying regions of patients with the most disagreement between providers. We formalize this as a causal inference problem, where disagreement is defined by the causal effect of the provider on the treatment decision. We illustrate this algorithm on first-line type 2 diabetes and Parkinson's treatment decisions and uncover regions of variation that align with uncertainty in clinical guidelines.
In the third part of this thesis, we build a tool for examining the effects of variation over time or across providers for individual patients. We use a large language model built on electronic health record concepts to generate patient trajectories. To enable interventions on time and provider, we introduce new tokenizations for these concepts. We also incorporate a structural causal model for patient visits to allow for generation of interventional and counterfactual trajectories. We hope the model in this part of the thesis can be used to answer additional questions about how patient trajectories would change if they were treated during a different time period or by a different provider.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology