Improving Efficiency and Fairness in Machine Learning: a Discrete Optimization Approach
Author(s)
Bandi, Hari
DownloadThesis PDF (6.314Mb)
Advisor
Bertsimas, Dimitris
Terms of use
Metadata
Show full item recordAbstract
In recent years, machine learning models are being increasingly deployed in various applications including Education, Finance, Healthcare, Transportation, etc. However, in most practical situations one-size-fits-all solutions suffer from poor predictive performance and/or bias against certain subgroups. This necessitates developing newer approaches to enhance robustness, interpretability and fairness in the resulting machine learning systems. We borrow tools from discrete and robust optimization to develop models and algorithms for such systems.
The first part of this thesis focuses on developing novel methodologies to enhance performance of specific predictive models. In particular, in the first chapter we propose a novel Mixed Integer Optimization (MIO) formulation that optimally recovers the parameters of a Gaussian mixture model (GMM) by minimizing a discrepancy measure (either the Kolmogorov-Smirnov or the Total variation distance) between the empirical distribution function and the distribution function of the GMM whenever the mixture component weights are known. In the second chapter, we present a holistic framework employing tensor completion and robust optimization for prescribing influenza vaccine composition. We also build an optimal classification tree to predict the efficacy of the proposed vaccine in terms of morbidity and mortality rates for different countries.
In the second part of the thesis, we present novel algorithms to alleviate systemic bias with respect to gender, race and ethnicity, often unconscious, but prevalent in datasets involving choices made by people. We propose (a) a novel optimization approach based on optimally flipping outcome labels and training classification models simultaneously to discover changes to be made in the selection process so as to achieve diversity without significantly affecting meritocracy, and (b) a novel implementation tool employing optimal classification trees to provide insights on which attributes of individuals lead to flipping of their labels, and to help make changes in the current selection processes in a manner understandable by human decision makers.
In the final chapter, we present an application of our work on a discharge disposition prediction problem for trauma patients to debias the dataset with respect to race, and train optimal classification trees to predict discharge decisions for trauma patients with penetrating injuries. Our impact here is two fold: (1) alleviating bias to enhance diversity in discharge decisions and developing an implementation tool using optimal classification trees to promote changes in the selection process, and (2) improving predictive performance (AUC) of the resulting classifiers after debiasing the dataset.
Date issued
2021-09Department
Massachusetts Institute of Technology. Operations Research CenterPublisher
Massachusetts Institute of Technology