Show simple item record

dc.contributor.advisorMądry, Aleksander
dc.contributor.authorJain, Saachi
dc.date.accessioned2024-03-21T19:13:36Z
dc.date.available2024-03-21T19:13:36Z
dc.date.issued2024-02
dc.date.submitted2024-02-21T17:18:47.875Z
dc.identifier.urihttps://hdl.handle.net/1721.1/153886
dc.description.abstractNeural networks can fail to generalize to real world data — particularly on subpopulations that might have been mislabelled, corrupted, or underrepresented during training. In such settings, the set of features that a model relies on, or its feature prior, often determines the model’s ultimate reliability. While many factors contribute to a model’s feature prior, recent evidence indicates that the training dataset often plays a pivotal role. This thesis therefore aims to build the foundation for a data-centric perspective on model reliability, by uncovering how the training dataset’s composition affects the model’s feature prior, and thus the mistakes the model tends to make. It advances this objective through two main thrusts: developing scalable tools for identifying model failure modes in large datasets in large datasets and investigating the impact of pre-training data on the reliability of transfer learning models. In the first thrust, we develop techniques for uncovering meaningful patterns of model errors, especially in settings where manual exploration is prohibitively expensive. This includes building a framework for generating counterfactual images to debug model behavior as well as introducing a technique for automatically identifying failure modes by distilling them as directions in a latent space. We also propose a data-based approach to mitigate such failures at their source, by isolating training examples that drive a targeted bias. to mitigate such failures at their source, by isolating training examples that drive a targeted bias. In the second thrust, we investigate the role of the pre-training data in the transfer learning setting, where a pre-trained model is adapted to a downstream task. Here, we f irst explore the problem of “bias transfer”, where biases from the pre-trained model can persist even after adapting the model to the downstream task. We then introduce transfer influences, a framework for pinpointing the counterfactual impact of a pre-training datapoint on the final prediction. This framework enables us to isolate (and remove) detrimental points from the pre-training dataset to improve transfer learning performance.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleA Data-Based Perspective on Model Reliability
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record