Adversarial robustness without perturbations

Rodríguez Muñoz, Adrán

Author(s)

Rodríguez Muñoz, Adrán

DownloadThesis PDF (15.16Mb)

Advisor

Torralba, Antonio

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Models resistant to adversarial perturbations are stable around the neighbourhoods of input images, such that small changes, known as adversarial attacks, cannot dramatically change the prediction. Currently, this stability is obtained with Adversarial Training, which directly teaches models to be robust by training on the perturbed examples themselves. In this work, we show the surprisingly similar performance of instead regularizing the model input-gradients of un-perturbed examples only. Regularizing the input-gradient norm is commonly believed to be significantly worse than Adversarial Training. Our experiments determine that the performance of Gradient Norm critically depends on the smoothness of the activation functions of the model, and is in fact highly peformant on modern vision transformers that natively use smooth GeLU over piecewise linear ReLUs. On ImageNet-1K, Gradient Norm regularization achieves more than 90% of the performance of state-of-the-art Adversarial Training with PGD-3 (52% vs. 56%) with 60% of the training time and without complex inner-maximization. Further experiments shed light on additional properties relating model robustness and input-gradients of unperturbed images, such as asymmetric color statistics. Suprisingly, we also show significant adversarial robustness may be obtained by simply conditioning gradients to focus on image edges, without explicit regularization of the norm.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156344

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses