Techniques for Interpretability and Transparency of Black-Box Models

Zhou, Yilun

Author(s)

Zhou, Yilun

DownloadThesis PDF (26.90Mb)

Advisor

Shah, Julie A.

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The last decade witnessed immense progress in machine learning, which has been deployed in many domains such as healthcare, finance and justice. However, recent advances are largely powered by deep neural networks, whose opacity hinders people's ability to inspect these models. Furthermore, legal requirements are being proposed to require a level of model understanding as a prerequisite to the deployment and use. These factors have spurred research that increases the interpretability and transparency of these models. This thesis makes several contributions in this direction. We start with a concise but practical overview of the current techniques for defining and evaluating explanations for model predictions. Then, we observe a novel duality between definitions and evaluations of various interpretability concepts, propose a new way to generate explanations and study the properties of these new explanations. Next, we investigate two fundamental properties of good explanations in detail: correctness -- whether the explanations are reflective of the model's internal decision making logic, and understandability -- whether humans can accurately infer the higher level and more general model behaviors from these explanations. For each aspect, we propose evaluations to assess existing model explanation methods and discuss their strengths and weaknesses. Following this, we ask the question of what instances to explain, and introduce the transparency-by-example perspective as an answer to this question. We demonstrate its benefits in revealing hidden properties of both image classifiers and robot controllers. Last, the thesis identifies directions for future research, and advocates for a tighter integration of model interpretability and transparency into the ecosystem of trustworthy machine learning research that also encompass efforts such as fairness, robustness and privacy.

Date issued

2023-02

URI

https://hdl.handle.net/1721.1/150171

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses