Understanding non-robust features in image classification
Author(s)
Wei, Kuo-An Andy.
Download1193031569-MIT.pdf (7.392Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Aleksander Mądry.
Terms of use
Metadata
Show full item recordAbstract
Despite the remarkable success of deep neural networks on image classification tasks, they exhibit a surprising vulnerability to certain small worst-case perturbations, also known as adversarial examples. Over the years, many different theories have been proposed to explain this puzzling phenomenon. Recent work by Ilyas et al. proposes a fresh new take on the existence of adversarial examples--that adversarial examples are inevitable due to certain well-generalizing but non-robust features present in the natural data [14]. We build upon the "non-robust features" framework raised by Ilyas et al., and present some new observations on the properties of non-robust features. We showcase some visualization techniques based on adversarial attacks to help us build an intuitive understanding of non-robust features. Lastly, we propose a novel framework for analyzing the types of information present in non-robust features, known as the adversarial transferability analysis.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 39-41).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.