A Data-Based Perspective on Model Reliability
Author(s)
Jain, Saachi
DownloadThesis PDF (83.29Mb)
Advisor
Mądry, Aleksander
Terms of use
Metadata
Show full item recordAbstract
Neural networks can fail to generalize to real world data — particularly on subpopulations that might have been mislabelled, corrupted, or underrepresented during training. In such settings, the set of features that a model relies on, or its feature prior, often determines the model’s ultimate reliability. While many factors contribute to a model’s feature prior, recent evidence indicates that the training dataset often plays a pivotal role. This thesis therefore aims to build the foundation for a data-centric perspective on model reliability, by uncovering how the training dataset’s composition affects the model’s feature prior, and thus the mistakes the model tends to make. It advances this objective through two main thrusts: developing scalable tools for identifying model failure modes in large datasets in large datasets and investigating the impact of pre-training data on the reliability of transfer learning models.
In the first thrust, we develop techniques for uncovering meaningful patterns of model errors, especially in settings where manual exploration is prohibitively expensive. This includes building a framework for generating counterfactual images to debug model behavior as well as introducing a technique for automatically identifying failure modes by distilling them as directions in a latent space. We also propose a data-based approach to mitigate such failures at their source, by isolating training examples that drive a targeted bias. to mitigate such failures at their source, by isolating training examples that drive a targeted bias.
In the second thrust, we investigate the role of the pre-training data in the transfer learning setting, where a pre-trained model is adapted to a downstream task. Here, we f irst explore the problem of “bias transfer”, where biases from the pre-trained model can persist even after adapting the model to the downstream task. We then introduce transfer influences, a framework for pinpointing the counterfactual impact of a pre-training datapoint on the final prediction. This framework enables us to isolate (and remove) detrimental points from the pre-training dataset to improve transfer learning performance.
Date issued
2024-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology