The Limitations of Stylometry for Detecting Machine-Generated Fake News

Schuster, Tal; Schuster, Roei; Shah, Darsh J; Barzilay, Regina

dc.contributor.author	Schuster, Tal
dc.contributor.author	Schuster, Roei
dc.contributor.author	Shah, Darsh J
dc.contributor.author	Barzilay, Regina
dc.date.accessioned	2021-10-27T20:23:24Z
dc.date.available	2021-10-27T20:23:24Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/135419
dc.description.abstract	© 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.
dc.language.iso	en
dc.publisher	MIT Press - Journals
dc.relation.isversionof	10.1162/COLI_A_00380
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs License
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source	MIT Press
dc.title	The Limitations of Stylometry for Detecting Machine-Generated Fake News
dc.type	Article
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Computational Linguistics
dc.eprint.version	Final published version
dc.type.uri	http://purl.org/eprint/type/JournalArticle
eprint.status	http://purl.org/eprint/status/PeerReviewed
dc.date.updated	2020-12-01T17:58:51Z
dspace.orderedauthors	Schuster, T; Schuster, R; Shah, DJ; Barzilay, R
dspace.date.submission	2020-12-01T17:58:54Z
mit.journal.volume	46
mit.journal.issue	2
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: coli_a_00380.pdf
Size:: 242.2Kb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record