Understanding non-robust features in image classification
Author(s)Wei, Kuo-An Andy.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Despite the remarkable success of deep neural networks on image classification tasks, they exhibit a surprising vulnerability to certain small worst-case perturbations, also known as adversarial examples. Over the years, many different theories have been proposed to explain this puzzling phenomenon. Recent work by Ilyas et al. proposes a fresh new take on the existence of adversarial examples--that adversarial examples are inevitable due to certain well-generalizing but non-robust features present in the natural data . We build upon the "non-robust features" framework raised by Ilyas et al., and present some new observations on the properties of non-robust features. We showcase some visualization techniques based on adversarial attacks to help us build an intuitive understanding of non-robust features. Lastly, we propose a novel framework for analyzing the types of information present in non-robust features, known as the adversarial transferability analysis.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 39-41).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.