MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automated Interpretation of Machine Learning Models

Author(s)
Hernandez, Evan
Thumbnail
DownloadThesis PDF (37.42Mb)
Advisor
Andreas, Jacob
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
As machine learning (ML) models are increasingly deployed in production, there’s a pressing need to ensure their reliability through auditing, debugging, and testing. Interpretability, the subfield that studies how ML models make decisions, aspires to meet this need but traditionally relies on human-led experimentation or is based on human priors about what the model has learned. In this thesis, I propose that interpretability should evolve alongside ML by adopting automated techniques that use ML models to interpret ML models. This shift towards automation allows for more comprehensive analyses of ML models without requiring human scrutiny at every step, and the effectiveness of these methods should improve as the ML models themselves become more sophisticated. I present three examples of automated interpretability approaches: using a captioning model to label features of other models, manipulating a ML model’s internal representations to predict and correct errors, and identifying simple internal circuits through approximating the ML model itself. These examples lay the groundwork for future efforts in automating ML model interpretation.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156277
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.