| dc.contributor.advisor | Torralba, Antonio | |
| dc.contributor.author | Rodríguez Muñoz, Adrán | |
| dc.date.accessioned | 2024-08-21T18:58:16Z | |
| dc.date.available | 2024-08-21T18:58:16Z | |
| dc.date.issued | 2024-05 | |
| dc.date.submitted | 2024-07-10T12:59:51.460Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/156344 | |
| dc.description.abstract | Models resistant to adversarial perturbations are stable around the neighbourhoods of input images, such that small changes, known as adversarial attacks, cannot dramatically change the prediction. Currently, this stability is obtained with Adversarial Training, which directly teaches models to be robust by training on the perturbed examples themselves. In this work, we show the surprisingly similar performance of instead regularizing the model input-gradients of un-perturbed examples only. Regularizing the input-gradient norm is commonly believed to be significantly worse than Adversarial Training. Our experiments determine that the performance of Gradient Norm critically depends on the smoothness of the activation functions of the model, and is in fact highly peformant on modern vision transformers that natively use smooth GeLU over piecewise linear ReLUs. On ImageNet-1K, Gradient Norm regularization achieves more than 90% of the performance of state-of-the-art Adversarial Training with PGD-3 (52% vs. 56%) with 60% of the training time and without complex inner-maximization. Further experiments shed light on additional properties relating model robustness and input-gradients of unperturbed images, such as asymmetric color statistics. Suprisingly, we also show significant adversarial robustness may be obtained by simply conditioning gradients to focus on image edges, without explicit regularization of the norm. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.title | Adversarial robustness without perturbations | |
| dc.type | Thesis | |
| dc.description.degree | S.M. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |