MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Adversarial robustness without perturbations

Author(s)
Rodríguez Muñoz, Adrán
Thumbnail
DownloadThesis PDF (15.16Mb)
Advisor
Torralba, Antonio
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Models resistant to adversarial perturbations are stable around the neighbourhoods of input images, such that small changes, known as adversarial attacks, cannot dramatically change the prediction. Currently, this stability is obtained with Adversarial Training, which directly teaches models to be robust by training on the perturbed examples themselves. In this work, we show the surprisingly similar performance of instead regularizing the model input-gradients of un-perturbed examples only. Regularizing the input-gradient norm is commonly believed to be significantly worse than Adversarial Training. Our experiments determine that the performance of Gradient Norm critically depends on the smoothness of the activation functions of the model, and is in fact highly peformant on modern vision transformers that natively use smooth GeLU over piecewise linear ReLUs. On ImageNet-1K, Gradient Norm regularization achieves more than 90% of the performance of state-of-the-art Adversarial Training with PGD-3 (52% vs. 56%) with 60% of the training time and without complex inner-maximization. Further experiments shed light on additional properties relating model robustness and input-gradients of unperturbed images, such as asymmetric color statistics. Suprisingly, we also show significant adversarial robustness may be obtained by simply conditioning gradients to focus on image edges, without explicit regularization of the norm.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156344
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.