Improved methodology for evaluating adversarial robustness in deep neural networks

Lee, Kyungmi,(Computer scientist)Massachusetts Institute of Technology.

dc.contributor.advisor	Anantha P. Chandrakasan.	en_US
dc.contributor.author	Lee, Kyungmi,(Computer scientist)Massachusetts Institute of Technology.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2020-09-15T21:53:27Z
dc.date.available	2020-09-15T21:53:27Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/127350
dc.description	Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020	en_US
dc.description	Cataloged from the official PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 89-93).	en_US
dc.description.abstract	Deep neural networks are known to be vulnerable to adversarial perturbations, which are often imperceptible to humans but can alter predictions of machine learning systems. Since the exact value of adversarial robustness is difficult to obtain for complex deep neural networks, accuracy of the models against perturbed examples generated by attack methods is empirically used as a proxy to adversarial robustness. However, failure of attack methods to find adversarial perturbations cannot be equated with being robust.	en_US
dc.description.abstract	In this work, we identify three common cases that lead to overestimation of accuracy against perturbed examples generated by bounded first-order attack methods: 1) the value of cross-entropy loss numerically becoming zero when using standard floating point representation, resulting in non-useful gradients; 2) innately non-differentiable functions in deep neural networks, such as Rectified Linear Unit (ReLU) activation and MaxPool operation, incurring "gradient masking" [2]; and 3) certain regularization methods used during training inducing the model to be less amenable to first-order approximation. We show that these phenomena exist in a wide range of deep neural networks, and that these phenomena are not limited to specific defense methods they have been previously investigated for.	en_US
dc.description.abstract	For each case, we propose compensation methods that either address sources of inaccurate gradient computation, such as numerical saturation for near zero values and non-differentiability, or reduce the total number of back-propagations for iterative attacks by approximating second-order information. These compensation methods can be combined with existing attack methods for a more precise empirical evaluation metric. We illustrate the impact of these three phenomena with examples of practical interest, such as benchmarking model capacity and regularization techniques for robustness. Furthermore, we show that the gap between adversarial accuracy and the guaranteed lower bound of robustness can be partially explained by these phenomena. Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks, and highlights cautions of using empirical evaluation without guaranteed bounds.	en_US
dc.description.statementofresponsibility	by Kyungmi Lee.	en_US
dc.format.extent	93 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Improved methodology for evaluating adversarial robustness in deep neural networks	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1192484009	en_US
dc.description.collection	S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2020-09-15T21:53:27Z	en_US
mit.thesis.degree	Master	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1192484009-MIT.pdf
Size:: 2.567Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record