Persistent cascades and the structure of influence in a communication network
Author(s)Morse, Steven T
Massachusetts Institute of Technology. Operations Research Center.
Marta C. González.
MetadataShow full item record
We present work in identifying, modeling, and predicting the structure of influence in a communication network. We focus on cellular phone data, which provides a near-global population sample (in contrast to the relatively limited scope of social media and other internet-based datasets) at the expense of losing any knowledge of the content of the communications themselves. First, using inexact tree matching and hierarchical clustering, we propose a novel method for extracting persistent patterns of communication among individuals, which we term persistent cascades. We find the cascades are short in duration ('bursty'), exhibit habitual hierarchy and long-term persistence, and reveal new roles in weekday vs. weekend spreading. We show that the persistent cascades in the data are significantly different than what is found in a random network, which we illustrate both analytically and through simulation. We show that persistent cascade membership increases the likelihood of receiving information spreading through the network, even after controlling for overall call activity. Finally, we show that the method is extensible to other communication datasets by applying it to an email dataset. In this case study, we find our approach correctly identifies key individuals, ignores noise, and identifies several interesting email chains. Second, we propose a probabilistic model for the influence structure of a network, based on a multivariate stochastic process called a Hawkes process. We develop a novel approach for parameter estimation in this model that uses a Bayesian expectation-maximization (EM) scheme with a network prior. We first apply the model in the univariate case to the group conversations identified using the persistent cascades methodology. We find that the model performs well as a predictor, and also that the estimated parameter values reveal two types of persistent cascades: low-activity conversations with high temporal clustering, and high activity conversations with moderate temporal clustering. We then apply the model in the multivariate case to samples of the cell phone data, finding that the resulting estimate of the influence matrix extends our findings with the persistent cascades.
Thesis: S.M., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2017.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 90-95).
DepartmentMassachusetts Institute of Technology. Operations Research Center.
Massachusetts Institute of Technology
Operations Research Center.