Towards ML Models That We Can Deploy Confidently

Salman, Hadi

Author(s)

Salman, Hadi

DownloadThesis PDF (42.72Mb)

Advisor

Madry, Aleksander

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

As machine learning (ML) systems are deployed in the real world, the reliability and trustworthiness of these systems become an even more salient challenge. This thesis aims to address this challenge through two key thrusts: (1) making ML models more trustworthy by leveraging what has been perceived solely as a weakness of ML model—adversarial perturbations, and (2) exploring the underpinnings of reliable ML deployment. Specifically, in the first thrust, we focus on adversarial perturbations, which constitute a well-known threat to integrity of ML models, and show how to build ML models that are robust to so-called adversarial patches. We then show that adversarial perturbations can be repurposed to not just be a weakness of ML models but rather to bolster these models’ resilience and reliability. To this end, we leverage these perturbations to, first, develop a way to create objects that are easier for ML models to recognize, then to devise a way to safeguard images against unwanted AI-powered alterations, and finally to improve transfer learning performance. The second thrust of this thesis revolves around ML model interpretability and debugging so as to ensure safety, equitability, and unbiased decision-making of ML systems. In particular, we investigate methods for building ML models that are more debuggable and provide tools for diagnosing their failure modes. We then study how data affects model behavior, identify unexpected ways in which data might introduce biases into ML models, particularly in the context of transfer learning. Finally, we put forth a data-based framework for studying transfer learning which can help us discover problematic biases inherited from pretraining data.

Date issued

2023-09

URI

https://hdl.handle.net/1721.1/152859

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses