Building and using robust representations in image classification
Author(s)
Tran, Brandon Vanhuy.
Download1197636585-MIT.pdf (24.49Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Mathematics.
Advisor
Aleksander Madry.
Terms of use
Metadata
Show full item recordAbstract
One of the major appeals of the deep learning paradigm is the ability to learn high-level feature representations of complex data. These learned representations obviate manual data pre-processing, and are versatile enough to generalize across tasks. However, they are not yet capable of fully capturing abstract, meaningful features of the data. For instance, the pervasiveness of adversarial examples--small perturbations of correctly classified inputs causing model misclassification--is a prominent indication of such shortcomings. The goal of this thesis is to work towards building learned representations that are more robust and human-aligned. To achieve this, we turn to adversarial (or robust) training, an optimization technique for training networks less prone to adversarial inputs. Typically, robust training is studied purely in the context of machine learning security (as a safeguard against adversarial examples)--in contrast, we will cast it as a means of enforcing an additional prior onto the model. Specifically, it has been noticed that, in a similar manner to the well-known convolutional or recurrent priors, the robust prior serves as a "bias" that restricts the features models can use in classification--it does not allow for any features that change upon small perturbations. We find that the addition of this simple prior enables a number of downstream applications, from feature visualization and manipulation to input interpolation and image synthesis. Most importantly, robust training provides a simple way of interpreting and understanding model decisions. Besides diagnosing incorrect classification, this also has consequences in the so-called "data poisoning" setting, where an adversary corrupts training samples with the hope of causing misbehaviour in the resulting model. We find that in many cases, the prior arising from robust training significantly helps in detecting data poisoning.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, May, 2020 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 115-131).
Date issued
2020Department
Massachusetts Institute of Technology. Department of MathematicsPublisher
Massachusetts Institute of Technology
Keywords
Mathematics.