Adversarial Examples and Distribution Shift: A Representations Perspective

Nadhamuni, Kaveri

Author(s)

Nadhamuni, Kaveri

DownloadThesis PDF (1.885Mb)

Advisor

Madry, Aleksander

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Adversarial attacks cause machine learning models to produce wrong predictions by minimally perturbing their input. In this thesis, we take a step towards understanding how these perturbations affect the intermediate data representations of the model. Specifically, we compare standard and adversarial representations for models of varying robustness based on a variety of similarity metrics. In fact, we find that it’s possible to detect adversarial examples by examining nearby examples, though we also find that this method can be circumvented by an adaptive attack. We then explore methods to improve generalization to natural distribution shift and hypothesize that models trained with different notions of feature bias will learn fundamentally different representations. We find that combining such diverse representations can provide a more comprehensive representation of the input data, potentially allowing better generalization to novel domains. Finally, we find that representation similarity metrics can be used to predict how well a model will be able to transfer between tasks.

Date issued

2021-06

URI

https://hdl.handle.net/1721.1/138945

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses