Adversarial robustness without perturbations

Rodríguez Muñoz, Adrán

dc.contributor.advisor	Torralba, Antonio
dc.contributor.author	Rodríguez Muñoz, Adrán
dc.date.accessioned	2024-08-21T18:58:16Z
dc.date.available	2024-08-21T18:58:16Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-10T12:59:51.460Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156344
dc.description.abstract	Models resistant to adversarial perturbations are stable around the neighbourhoods of input images, such that small changes, known as adversarial attacks, cannot dramatically change the prediction. Currently, this stability is obtained with Adversarial Training, which directly teaches models to be robust by training on the perturbed examples themselves. In this work, we show the surprisingly similar performance of instead regularizing the model input-gradients of un-perturbed examples only. Regularizing the input-gradient norm is commonly believed to be significantly worse than Adversarial Training. Our experiments determine that the performance of Gradient Norm critically depends on the smoothness of the activation functions of the model, and is in fact highly peformant on modern vision transformers that natively use smooth GeLU over piecewise linear ReLUs. On ImageNet-1K, Gradient Norm regularization achieves more than 90% of the performance of state-of-the-art Adversarial Training with PGD-3 (52% vs. 56%) with 60% of the training time and without complex inner-maximization. Further experiments shed light on additional properties relating model robustness and input-gradients of unperturbed images, such as asymmetric color statistics. Suprisingly, we also show significant adversarial robustness may be obtained by simply conditioning gradients to focus on image edges, without explicit regularization of the norm.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Adversarial robustness without perturbations
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: rodriguezmunoz-adrianrm-sm-eec ...
Size:: 15.16Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record