Knowledge Distillation for Interpretable Clinical Time Series Outcome Prediction

Wong, Anna

Author(s)

Wong, Anna

DownloadThesis PDF (3.400Mb)

Advisor

Mark, Roger G.

Lehman, Li-wei

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

A common machine learning task in healthcare is to predict a patient’s final outcome given their history of vitals and treatments. For example, sepsis is a life-threatening condition that happens when the body has an extreme response to an infection. Treating sepsis is a complicated process, and we are interested in being able to predict a sepsis patient’s final outcome. Neural networks are a powerful model to make accurate predictions on such outcomes, but a major drawback of these models is that they are not interpretable. Being able to accurately predict treatment outcomes while also being able to understand the model’s predictions is necessary for these models and algorithms to be used in the real world. In this thesis, we use knowledge distillation, which is a technique for taking a model with high predictive power (known as the "teacher model"), and using it to train a model that has other desirable traits such as interpretability (known as the "student model"). For our teacher model, we use an LSTM, which is a type of neural network, to predict mortality for sepsis patients, given information about their recent history of vital signs and treatments. For our student model, we use an autoregressive hidden Markov model to learn interpretable hidden states. To incorporate the knowledge from the teacher model into the student model, we use a similarity-based constraint. We evaluate a method from a previous work that uses variational inference to learn the hidden states, and also develop and evaluate an alternative approach that uses the expectation-maximization algorithm. We analyze the interpretability of the learned states. Our results show that, although there is room for improvement in maintaining the generative performance of the model after adding the similarity constraint, the expectation-maximization algorithm is successful in incorporating the constraint to achieve high predictive power similar to the teacher model, along with better interpretability when compared to the teacher model.

Date issued

2023-06

URI

https://hdl.handle.net/1721.1/151355

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses