Detecting bias in image classification using model explanations/
Author(s)
Tong, Schrasing.
Download1194646862-MIT.pdf (27.09Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Lalana Kagal.
Terms of use
Metadata
Show full item recordAbstract
Machine learning has risen greatly in popularity and many of its applications have begun to impact our daily life. However, recent research has shown that trained models could produce biased predictions and discriminate against socially vulnerable groups. To address these problems, researchers have proposed various definitions of fairness as desirable forms of equality, often relying on sensitive attributes such as race or gender. In the image domain, the study of bias exceeds beyond fairness and sensitive attributes as biased models tend to generalize poorly in real-world applications, especially when the distribution differs slightly from the test set. Detecting bias in these scenarios is extremely challenging due to the many possible causes of biases, a problem further exacerbated by the lack of explicit labels on these features. Research on explanation generation focuses on providing insights on a model's decision-making process and claims to detect bias in image classification. In this thesis, I investigated whether this claim holds true by proposing a list of important characteristics, including the ability to detect the cause and degree of bias, efficiency in terms of human effort involved, human understandability, and scalability towards multiple biases, and analyzed whether two popular explanation mechanisms, namely GRAD-CAM and TCAV, actually achieve them. To this end, I curated two datasets with fine labels and balanced sample sizes for the biased features and trained models with different degrees of bias by altering the data composition. Doing so allowed me to generate explanations on different models and observe how explanations change as the underlying data bias, and in turn the set of discriminating features shift. Al- though explanations help detect bias in most scenarios, they produced noisy results and performed poorly in estimating the degree of bias present. Aside from assessing the limits of explanations in bias detection, the approach employed in this thesis also serves as a novel method to evaluate the faithfulness of generated explanations.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 67-69).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.