Show simple item record

dc.contributor.authorChen, Hongge
dc.contributor.authorBoning, Duane S
dc.date.accessioned2021-03-05T11:58:35Z
dc.date.available2021-03-05T11:58:35Z
dc.date.issued2019-05
dc.date.submitted2019-01
dc.identifier.urihttps://hdl.handle.net/1721.1/130088
dc.description.abstractThe adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. Test examples that are relatively far away from this manifold are more likely to be vulnerable to adversarial attacks. Consequentially, an adversarial training based defense is susceptible to a new class of attacks, the “blind-spot attack”, where the input images reside in “blind-spots” (low density regions) of the empirical distribution of training data but is still on the ground-truth data manifold. For MNIST, we found that these blind-spots can be easily found by simply scaling and shifting image pixel values. Most importantly, for large datasets with high dimensional and complex data manifold (CIFAR, ImageNet, etc), the existence of blind-spots in adversarial training makes defending on any valid test examples difficult due to the curse of dimensionality and the scarcity of training data. Additionally, we find that blind-spots also exist on provable defenses including (Kolter & Wong, 2018) and (Sinha et al., 2018) because these trainable robustness certificates can only be practically optimized on a limited set of training data.en_US
dc.language.isoen
dc.publisherICLRen_US
dc.relation.isversionofhttps://openreview.net/forum?id=HylTBhA5tQen_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleThe limitations of adversarial training and the blind-spot attacken_US
dc.typeArticleen_US
dc.identifier.citationZhang, Huan et al. “The limitations of adversarial training and the blind-spot attack.” Paper presented at the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, May 6 - 9, 2019, ICLR © 2019 The Author(s)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journal7th International Conference on Learning Representations, ICLR 2019en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2020-12-03T16:03:40Z
dspace.orderedauthorsZhang, H; Chen, H; Song, Z; Boning, D; Dhillon, I; Hsieh, CJen_US
dspace.date.submission2020-12-03T16:03:44Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record