Robustness may be at odds with accuracy

Tsipras, Dimitris; Santurkar, Shibani (Shibani Vinay); Engstrom, Logan G.; Turner, Alexander M.; Madry, Aleksander

Author(s)

Tsipras, Dimitris; Santurkar, Shibani (Shibani Vinay); Engstrom, Logan G.; Turner, Alexander M.; Madry, Aleksander

DownloadAccepted version (5.904Mb)

Open Access Policy

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists even in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed in practice. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the features learned by robust models tend to align better with salient data characteristics and human perception.

Date issued

2019-04

URI

https://hdl.handle.net/1721.1/130090

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

7th International Conference on Learning Representations, ICLR 2019

Publisher

ICLR

Citation

Tsipras, Dimitris et al. “Robustness may be at odds with accuracy.” Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, May 6 - 9, 2019, ICLR © 2019 The Author(s)

Version: Author's final manuscript

Collections

MIT Open Access Articles