Techniques for Interpretability and Transparency of Black-Box Models

Zhou, Yilun

dc.contributor.advisor	Shah, Julie A.
dc.contributor.author	Zhou, Yilun
dc.date.accessioned	2023-03-31T14:37:23Z
dc.date.available	2023-03-31T14:37:23Z
dc.date.issued	2023-02
dc.date.submitted	2023-02-28T14:39:27.455Z
dc.identifier.uri	https://hdl.handle.net/1721.1/150171
dc.description.abstract	The last decade witnessed immense progress in machine learning, which has been deployed in many domains such as healthcare, finance and justice. However, recent advances are largely powered by deep neural networks, whose opacity hinders people's ability to inspect these models. Furthermore, legal requirements are being proposed to require a level of model understanding as a prerequisite to the deployment and use. These factors have spurred research that increases the interpretability and transparency of these models. This thesis makes several contributions in this direction. We start with a concise but practical overview of the current techniques for defining and evaluating explanations for model predictions. Then, we observe a novel duality between definitions and evaluations of various interpretability concepts, propose a new way to generate explanations and study the properties of these new explanations. Next, we investigate two fundamental properties of good explanations in detail: correctness -- whether the explanations are reflective of the model's internal decision making logic, and understandability -- whether humans can accurately infer the higher level and more general model behaviors from these explanations. For each aspect, we propose evaluations to assess existing model explanation methods and discuss their strengths and weaknesses. Following this, we ask the question of what instances to explain, and introduce the transparency-by-example perspective as an answer to this question. We demonstrate its benefits in revealing hidden properties of both image classifiers and robot controllers. Last, the thesis identifies directions for future research, and advocates for a tighter integration of model interpretability and transparency into the ecosystem of trustworthy machine learning research that also encompass efforts such as fairness, robustness and privacy.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Techniques for Interpretability and Transparency of Black-Box Models
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: Zhou-yilun-PhD-EECS-2022-thesis.pdf
Size:: 26.90Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record