Advancing Equity and Reliability in Machine Learning
Author(s)
Shanmugam, Divya
DownloadThesis PDF (19.26Mb)
Advisor
Guttag, John V.
Terms of use
Metadata
Show full item recordAbstract
The data we have are often not the data we wish to use. This distinction can have serious consequences for the behavior of machine learning models across environments and demographic subgroups. If a disease is systematically underdiagnosed, machine learning models trained on this data risk replicating patterns of underdiagnosis. If the data used to evaluate machine learning models is not representative of data the models encounter during deployment, we risk missing model failures on subsets of the data distribution. If the demographics we use to assess the fairness of machine learning models are excessively coarse, we risk missing significant disparities in algorithmic performance. For domains in which f lawed data is common, these systematic differences represent a barrier to the widespread adoption of machine learning systems. In this thesis, we develop methods to encourage machine learning predictions to be reliable and equitable even when the underlying data are not. We approach this goal in three ways. We do so first by taking a data-centric lens, and developing methods to precisely characterize differences between the data we have and the data we wish to have (Chapters 2 & 3). We then adopt a model-centric lens to consider how one might efficiently update models without access to the training data (Chapters 4 & 5). Finally, we provide commentary on standard approaches to the use of race when evaluating machine learning systems (Chapter 6). In sum, this dissertation is a step towards machine learning methodology that is robust to the inevitably unreliable and inequitable data we are able to observe.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology