Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Vijayaraghavan, P; Roy, D

dc.contributor.author	Vijayaraghavan, P
dc.contributor.author	Roy, D
dc.date.accessioned	2021-11-02T12:24:29Z
dc.date.available	2021-11-02T12:24:29Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/137065
dc.description.abstract	© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG’s news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.	en_US
dc.language.iso	en
dc.publisher	Springer International Publishing	en_US
dc.relation.isversionof	10.1007/978-3-030-46147-8_43	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model	en_US
dc.type	Article	en_US
dc.identifier.citation	Vijayaraghavan, P and Roy, D. 2020. "Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model." Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11907 LNAI.
dc.contributor.department	Massachusetts Institute of Technology. Media Laboratory
dc.relation.journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-07-01T16:55:21Z
dspace.orderedauthors	Vijayaraghavan, P; Roy, D	en_US
dspace.date.submission	2021-07-01T16:55:22Z
mit.journal.volume	11907 LNAI	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 1909.07873.pdf
Size:: 635.4Kb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record