Structure, dynamics, and inference in networks

Chodrow, Philip S.(Philip Samuel)

Author(s)

Chodrow, Philip S.(Philip Samuel)

Download1227044606-MIT.pdf (5.652Mb)

Other Contributors

Massachusetts Institute of Technology. Operations Research Center.

Advisor

Patrick Jaillet and Marta González.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Networks offer a unified, conceptual formalism for reasoning about complex, relational systems. While pioneering work in network science focused primarily on the ability of "universal" models to explain the features of observed systems, contemporary research increasingly focuses on challenges and opportunities for data analysis in complex systems. In this thesis we study four problems, each of which is informed by the need for theory-informed modeling in network data science. The first chapter is a study of binary-state adaptive voter models (AVMs). AVMs model the emergence of global opinion-based network polarization from localized decision-making, doing so through a simple coupling of node and edge states. This coupling yields rich behavior, including phase transitions and low-dimensional quasistable manifolds. However, the coupling also makes these models extremely difficult to analyze.

Exploiting a novel asymmetry in the local dynamics, we provide low-dimensional approximations of unprecedented accuracy for one AVM variant, and of competitive accuracy for another. In the second chapter, we continue our focus on fragmentation in social systems with a study of spatial segregation. While the question of how to measure and quantify segregation has received extensive treatment in the sociological literature, this treatment tends to be mathematically disjoint. This results in scholars often re-proving the same results for special cases of measures, and grappling with incomparable methods for incorporating the role of space in their analyses. We provide contributions to address each of these issues. With respect to the first, we unify a large body of extant segregation measures through the calculus of Bregman divergences, showing that the most popular measures are instantiations of generalized mutual informations.

We then formulate a microscopic measure of spatial structure - the local information density - and prove a novel information-geometric result in order to measure it on real data in the common case in which the data is embedded in planar network. Using these tools, we are then able to formulate and evaluate several network-based regionalization algorithms for multiscale spatial analysis. We then take up two questions in null random graph modeling. The first of these develops a family of null random models for hypergraphs, the natural mathematical representation of polyadic networks in which multiple entities interact simultaneously. We formulate two distributions over spaces of hypergraphs subject to fixed node degree and edge dimension sequences, and provide Markov Chain Monte Carlo algorithms for sampling from them. We then conduct a sequence of experiments to highlight the role of hypergraph configuration models in the data science of polyadic networks.

We show that (a) the use of hypergraph nulls can lead to directionally different hypothesis-testing than the use of traditional nulls and that (b) polyadic nulls support richer and more complex measurements of graph structure. We close with a formulation of a novel measure of correlation in hypergraphs, as well as an asymptotic formula for estimating its expectations under one of our configuration models. In the final chapter, we study the expected adjacency matrix of a uniformly random multigraph with a fixed degree sequence. This matrix is an input into several common network analyses, including community-detection and mean-field theories of spreading properties on contact networks. The actual structure of this matrix, however, is not well understood. The main issues are (a) the combinatorial complexity of the space on which this random graph is defined and (b) an erroneous folk-theorem among network scientists which stems from confusion with related models.

By studying the dynamics of a Markov chain sampler, we prove a sequence of approximations that allow us to estimate the expected adjacency matrix - and other elementwise moments - using a fast numerical scheme with qualified uniqueness guarantees. We illustrate using a series of experiments on primary and secondary school contact networks, showing order-of-magnitude improvements over extant methods. We conclude with a description of several directions of future work.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020

Cataloged from student-submitted PDF of thesis.

Includes bibliographical references (pages 187-203).

Date issued

2020

URI

https://hdl.handle.net/1721.1/128994

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Keywords

Operations Research Center.

Collections

Doctoral Theses