Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

Gallagher, Ryan J; Frank, Morgan R; Mitchell, Lewis; Schwartz, Aaron J; Reagan, Andrew J; Danforth, Christopher M; Dodds, Peter S

dc.contributor.author	Gallagher, Ryan J
dc.contributor.author	Frank, Morgan R
dc.contributor.author	Mitchell, Lewis
dc.contributor.author	Schwartz, Aaron J
dc.contributor.author	Reagan, Andrew J
dc.contributor.author	Danforth, Christopher M
dc.contributor.author	Dodds, Peter S
dc.date.accessioned	2021-09-20T17:41:03Z
dc.date.available	2021-09-20T17:41:03Z
dc.date.issued	2021-01-19
dc.identifier.uri	https://hdl.handle.net/1721.1/131952
dc.description.abstract	Abstract A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback–Leibler and Jensen–Shannon divergences. Through a diverse set of case studies ranging from presidential speeches to tweets posted in urban green spaces, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.	en_US
dc.publisher	Springer Berlin Heidelberg	en_US
dc.relation.isversionof	https://doi.org/10.1140/epjds/s13688-021-00260-3	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Springer Berlin Heidelberg	en_US
dc.title	Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts	en_US
dc.type	Article	en_US
dc.identifier.citation	EPJ Data Science. 2021 Jan 19;10(1):4	en_US
dc.contributor.department	MIT Connection Science (Research institute)
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2021-01-24T04:37:04Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)
dspace.embargo.terms	N
dspace.date.submission	2021-01-24T04:37:04Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 13688_2021_Article_260.pdf
Size:: 2.504Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record