Building and using robust representations in image classification

Tran, Brandon Vanhuy.

Author(s)

Tran, Brandon Vanhuy.

Download1197636585-MIT.pdf (24.49Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Mathematics.

Advisor

Aleksander Madry.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

One of the major appeals of the deep learning paradigm is the ability to learn high-level feature representations of complex data. These learned representations obviate manual data pre-processing, and are versatile enough to generalize across tasks. However, they are not yet capable of fully capturing abstract, meaningful features of the data. For instance, the pervasiveness of adversarial examples--small perturbations of correctly classified inputs causing model misclassification--is a prominent indication of such shortcomings. The goal of this thesis is to work towards building learned representations that are more robust and human-aligned. To achieve this, we turn to adversarial (or robust) training, an optimization technique for training networks less prone to adversarial inputs. Typically, robust training is studied purely in the context of machine learning security (as a safeguard against adversarial examples)--in contrast, we will cast it as a means of enforcing an additional prior onto the model. Specifically, it has been noticed that, in a similar manner to the well-known convolutional or recurrent priors, the robust prior serves as a "bias" that restricts the features models can use in classification--it does not allow for any features that change upon small perturbations. We find that the addition of this simple prior enables a number of downstream applications, from feature visualization and manipulation to input interpolation and image synthesis. Most importantly, robust training provides a simple way of interpreting and understanding model decisions. Besides diagnosing incorrect classification, this also has consequences in the so-called "data poisoning" setting, where an adversary corrupts training samples with the hope of causing misbehaviour in the resulting model. We find that in many cases, the prior arising from robust training significantly helps in detecting data poisoning.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 115-131).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127912

Department

Massachusetts Institute of Technology. Department of Mathematics

Publisher

Massachusetts Institute of Technology

Keywords

Mathematics.

Collections

Doctoral Theses