Supersparse linear integer models for optimized medical scoring systems
Author(s)
Ustun, Berk; Rudin, Cynthia
Download10994_2015_5528_ReferencePDF.pdf (748.0Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0–1 loss) and sparsity (the ℓ[subscript 0]-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening.
Date issued
2015-11Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Sloan School of ManagementJournal
Machine Learning
Publisher
Springer US
Citation
Ustun, Berk, and Cynthia Rudin. “Supersparse Linear Integer Models for Optimized Medical Scoring Systems.” Machine Learning 102.3 (2016): 349–391.
Version: Author's final manuscript
ISSN
0885-6125
1573-0565