Headlines as networked language : a study of content and audience across 73 million links on Twitter

McClure, David(David W.)

Author(s)

McClure, David(David W.)

Download1108636877-MIT.pdf (16.15Mb)

Other Contributors

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Advisor

Deb Roy.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

How different, in a precise sense, is The New York Times from Fox News? Or - Fox from NPR, NPR from CNN, CNN from Breitbart? If we think of news organizations as producers of language, as "speakers" - how similar or different are the voices? This question of the distance between news sources is fundamental to concerns about fragmentation and polarization in the news ecosystem. A number of studies have measured the proximity between outlets in terms of overlap at the level of audience, and then defined content-level differences in terms of the underlying audience composition - for example, the fraction of the readership who have shared content from particular political candidates. The "content graph" of the news ecosystem - the set of similarities and differences at the level of the actual coverage - is often assumed to be tightly linked to the "audience graph"; and the two are even defined in terms of each other.

How exactly do these two systems interact, though? In many ways, our knowledge of the content graph is less precise than knowledge about the audience graph, which in some ways is simpler to measure. A rich line of work has studied the coverage of specific issues in news content, and recent work has started to systematically survey the content produced by a range of outlets, often by way of unsupervised approaches that characterize differences at the level of topic. Building on this, I attempt to precisely quantify the relative similarities among major media organizations from a standpoint of textual discriminabiliy, focusing on a corpus of 1.2 million article headlines from 15 major US news outlets, extracted from an archive of 73 million links posted on Twitter over a 625-day period running from the beginning of 2017 through the summer of 2018.

I formulate the question as a supervised learning problem, in which classifiers are presented with a headline and trained to identify the outlet that produced it. This training objective is used to induce high-quality distributed representations of headlines, and also makes it possible to measure the degree to which different outlets produce similar and dissimilar content. I then contextualize these language-level similarities against two backdrops. First, I examine the degree to which similarities at the level of headlines correlate with similarities at the level of audiences - with specific focus on sites of misalignment, where outlets "speak" in ways that don't match the typical patterns of other outlets that share similar audiences.

Among the news organizations considered in this study, the Associated Press and The Hill are the two most "misaligned" outlets, and we can perhaps look to specific portions of their content as a signal for the types of topics, styles, and stances that might be effective at permeating across axes of political and cultural difference. Second - I study headlines as a historical process. How stable are the linguistic profiles of major news organizations, and to what degree have they evolved into new configurations? I find significant changes over first 18 months of the Trump presidency, with BuzzFeed doubling down on "quiz" articles; Huffington Post moving away from lifestyle content and towards political reporting; The Daily Kos becoming less exclusively focused on politics; and Fox shifting towards a kind of "tabloid" style, with a focus on violent crime, personal misfortune, and socially-charged political issues.

Description

Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2019

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 82-84).

Date issued

2019

URI

https://hdl.handle.net/1721.1/121838

Department

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Publisher

Massachusetts Institute of Technology

Keywords

Program in Media Arts and Sciences

Collections

Graduate Theses