dc.contributor.advisor | Madry, Aleksander | |
dc.contributor.author | Nadhamuni, Kaveri | |
dc.date.accessioned | 2022-01-14T14:40:01Z | |
dc.date.available | 2022-01-14T14:40:01Z | |
dc.date.issued | 2021-06 | |
dc.date.submitted | 2021-06-17T20:13:55.731Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/138945 | |
dc.description.abstract | Adversarial attacks cause machine learning models to produce wrong predictions by minimally perturbing their input. In this thesis, we take a step towards understanding how these perturbations affect the intermediate data representations of the model. Specifically, we compare standard and adversarial representations for models of varying robustness based on a variety of similarity metrics. In fact, we find that it’s possible to detect adversarial examples by examining nearby examples, though we also find that this method can be circumvented by an adaptive attack. We then explore methods to improve generalization to natural distribution shift and hypothesize that models trained with different notions of feature bias will learn fundamentally different representations. We find that combining such diverse representations can provide a more comprehensive representation of the input data, potentially allowing better generalization to novel domains. Finally, we find that representation similarity metrics can be used to predict how well a model will be able to transfer between tasks. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright MIT | |
dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Adversarial Examples and Distribution Shift: A Representations Perspective | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |