Show simple item record

dc.contributor.authorGallagher, Ryan J
dc.contributor.authorFrank, Morgan R
dc.contributor.authorMitchell, Lewis
dc.contributor.authorSchwartz, Aaron J
dc.contributor.authorReagan, Andrew J
dc.contributor.authorDanforth, Christopher M
dc.contributor.authorDodds, Peter S
dc.date.accessioned2021-09-20T17:41:03Z
dc.date.available2021-09-20T17:41:03Z
dc.date.issued2021-01-19
dc.identifier.urihttps://hdl.handle.net/1721.1/131952
dc.description.abstractAbstract A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback–Leibler and Jensen–Shannon divergences. Through a diverse set of case studies ranging from presidential speeches to tweets posted in urban green spaces, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.en_US
dc.publisherSpringer Berlin Heidelbergen_US
dc.relation.isversionofhttps://doi.org/10.1140/epjds/s13688-021-00260-3en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer Berlin Heidelbergen_US
dc.titleGeneralized word shift graphs: a method for visualizing and explaining pairwise comparisons between textsen_US
dc.typeArticleen_US
dc.identifier.citationEPJ Data Science. 2021 Jan 19;10(1):4en_US
dc.contributor.departmentMIT Connection Science (Research institute)
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-01-24T04:37:04Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.embargo.termsN
dspace.date.submission2021-01-24T04:37:04Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record