New interpretable machine learning techniques and an application to stroke prediction in atrial fibrillation patients
Author(s)
Yang, Hongyu,Ph. D.Massachusetts Institute of Technology.
Download1142633819-MIT.pdf (10.15Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Cynthia Rudin.
Terms of use
Metadata
Show full item recordAbstract
Building interpretable and accurate models are attracting more and more interest in the machine learning community. In this thesis, we developed an interpretable machine learning algorithm called SBRL and we built an interpretable and statistically more accurate model for predicting strokes for patients in atrial fabrication (AF) who have not had a prior history of stroke and who are not taking anticoagulants. The first part of the thesis presents an interpretable machine learning algorithm that can be used as an alternative algorithm to the decision tree algorithm. Our algorithm builds an optimized rules list model from data by maximizing the posterior probability of a natural hierarchical generative model. It has the form of chained IF-THEN clauses which is simple for a human to follow and derive its prediction by hand. We developed two theoretical bounds for the algorithm. One for the length of the optimal rules list model; and the other for the upper bounds of the posterior probability of the optimized rules list given its prefixes. We thoroughly tested our algorithm against other interpretable and non-interpretable machine learning algorithms across multiple public datasets, in terms of interpretability, computational speed, and accuracy. Our algorithm strikes a balance among these metrics. The second part of the thesis presents how we used the ATRIA2-CVRN study cohort to build a stroke prediction model that is as simple as but statistically significantly more accurate than the stroke models in wide use, such as the CHA₂DS₂-VASc and ATRIA scores, for patients in AF who are not taking anticoagulants. We focused on the more challenging problem of primary prevention. We assessed the strengths of predictors and identified informative predictors not used in existing stroke models. We created a univariate stroke model using the most informative predictor age and achieved statistically significantly better performance than CHA₂DS₂-VASc and similar performance as ATRIA. We used various machine learning models to test the limit of the information that can be extracted from the data. We built a linear model with optimized integer coefficients using RiskSLIM. We used SBRL to generate simple-yet-accurate representations for high-risk patients who should be recommended anticoagulants.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 117-125).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.