Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation
Author(s)
Balagopalan, Aparna
DownloadThesis PDF (18.55Mb)
Advisor
Ghassemi, Marzyeh
Terms of use
Metadata
Show full item recordAbstract
Automated systems driven by machine learning (ML) have made exciting progress across a spectrum of applications. Despite such progress, encoded biases and other failure modes may create barriers to the real-world utility and reliability of such systems. For example, nonrandom data missingness, biased algorithmic optimization objectives, or model presentation strategies that incorrectly impact user trust can all cause models to fail in practice. In this thesis, guided by such observations and prior work on pipeline-awareness in machine learning, we aim to operationalize reliable ML. Under this goal, we propose a framework consisting of the following three components: responsible data collection, robust algorithm development, and fair model presentation. We first conduct two case studies to advance responsible data collection. We investigate whether standard procedures for acquiring data can be repurposed when training models to mimic human judgments about norm violations. We also demonstrate patterns of delayed demographic data reporting within a longitudinal healthcare dataset and show that timevarying missingness due to such delays can distort disparity assessments. Second, we introduce two novel algorithms to improve reliability: a method that leverages representations from vision-language models to filter noisy training data, and a method to produce fair rankings that account for properties of search queries. Finally, since the presentation design of predictions impacts trust in model consumers, we propose metrics to quantify the fairness of post-hoc explainability techniques. Thus, with this thesis, we re-evaluate measurements throughout the machine learning pipeline and contribute to the broader goal of reliable machine learning.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology