New interpretable machine learning techniques and an application to stroke prediction in atrial fibrillation patients

Yang, Hongyu,Ph. D.Massachusetts Institute of Technology.

Author(s)

Yang, Hongyu,Ph. D.Massachusetts Institute of Technology.

Download1142633819-MIT.pdf (10.15Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Cynthia Rudin.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Building interpretable and accurate models are attracting more and more interest in the machine learning community. In this thesis, we developed an interpretable machine learning algorithm called SBRL and we built an interpretable and statistically more accurate model for predicting strokes for patients in atrial fabrication (AF) who have not had a prior history of stroke and who are not taking anticoagulants. The first part of the thesis presents an interpretable machine learning algorithm that can be used as an alternative algorithm to the decision tree algorithm. Our algorithm builds an optimized rules list model from data by maximizing the posterior probability of a natural hierarchical generative model. It has the form of chained IF-THEN clauses which is simple for a human to follow and derive its prediction by hand. We developed two theoretical bounds for the algorithm.

One for the length of the optimal rules list model; and the other for the upper bounds of the posterior probability of the optimized rules list given its prefixes. We thoroughly tested our algorithm against other interpretable and non-interpretable machine learning algorithms across multiple public datasets, in terms of interpretability, computational speed, and accuracy. Our algorithm strikes a balance among these metrics. The second part of the thesis presents how we used the ATRIA2-CVRN study cohort to build a stroke prediction model that is as simple as but statistically significantly more accurate than the stroke models in wide use, such as the CHA₂DS₂-VASc and ATRIA scores, for patients in AF who are not taking anticoagulants. We focused on the more challenging problem of primary prevention. We assessed the strengths of predictors and identified informative predictors not used in existing stroke models.

We created a univariate stroke model using the most informative predictor age and achieved statistically significantly better performance than CHA₂DS₂-VASc and similar performance as ATRIA. We used various machine learning models to test the limit of the information that can be extracted from the data. We built a linear model with optimized integer coefficients using RiskSLIM. We used SBRL to generate simple-yet-accurate representations for high-risk patients who should be recommended anticoagulants.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 117-125).

Date issued

2019

URI

https://hdl.handle.net/1721.1/124095

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses