Detecting bias in image classification using model explanations/

Tong, Schrasing.

Author(s)

Tong, Schrasing.

Download1194646862-MIT.pdf (27.09Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Lalana Kagal.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Machine learning has risen greatly in popularity and many of its applications have begun to impact our daily life. However, recent research has shown that trained models could produce biased predictions and discriminate against socially vulnerable groups. To address these problems, researchers have proposed various definitions of fairness as desirable forms of equality, often relying on sensitive attributes such as race or gender. In the image domain, the study of bias exceeds beyond fairness and sensitive attributes as biased models tend to generalize poorly in real-world applications, especially when the distribution differs slightly from the test set. Detecting bias in these scenarios is extremely challenging due to the many possible causes of biases, a problem further exacerbated by the lack of explicit labels on these features.

Research on explanation generation focuses on providing insights on a model's decision-making process and claims to detect bias in image classification. In this thesis, I investigated whether this claim holds true by proposing a list of important characteristics, including the ability to detect the cause and degree of bias, efficiency in terms of human effort involved, human understandability, and scalability towards multiple biases, and analyzed whether two popular explanation mechanisms, namely GRAD-CAM and TCAV, actually achieve them. To this end, I curated two datasets with fine labels and balanced sample sizes for the biased features and trained models with different degrees of bias by altering the data composition. Doing so allowed me to generate explanations on different models and observe how explanations change as the underlying data bias, and in turn the set of discriminating features shift.

Al- though explanations help detect bias in most scenarios, they produced noisy results and performed poorly in estimating the degree of bias present. Aside from assessing the limits of explanations in bias detection, the approach employed in this thesis also serves as a novel method to evaluate the faithfulness of generated explanations.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 67-69).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127635

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses