Operationalizing Reliable Machine Learning: From Data Collection to Model Presentation

Balagopalan, Aparna

Author(s)

Balagopalan, Aparna

DownloadThesis PDF (18.55Mb)

Advisor

Ghassemi, Marzyeh

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Automated systems driven by machine learning (ML) have made exciting progress across a spectrum of applications. Despite such progress, encoded biases and other failure modes may create barriers to the real-world utility and reliability of such systems. For example, nonrandom data missingness, biased algorithmic optimization objectives, or model presentation strategies that incorrectly impact user trust can all cause models to fail in practice. In this thesis, guided by such observations and prior work on pipeline-awareness in machine learning, we aim to operationalize reliable ML. Under this goal, we propose a framework consisting of the following three components: responsible data collection, robust algorithm development, and fair model presentation. We first conduct two case studies to advance responsible data collection. We investigate whether standard procedures for acquiring data can be repurposed when training models to mimic human judgments about norm violations. We also demonstrate patterns of delayed demographic data reporting within a longitudinal healthcare dataset and show that timevarying missingness due to such delays can distort disparity assessments. Second, we introduce two novel algorithms to improve reliability: a method that leverages representations from vision-language models to filter noisy training data, and a method to produce fair rankings that account for properties of search queries. Finally, since the presentation design of predictions impacts trust in model consumers, we propose metrics to quantify the fairness of post-hoc explainability techniques. Thus, with this thesis, we re-evaluate measurements throughout the machine learning pipeline and contribute to the broader goal of reliable machine learning.

Date issued

2025-09

URI

https://hdl.handle.net/1721.1/164591

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses