Statistical inference from dependent data : networks and Markov chains

Dikkala, Sai Nishanth.

Author(s)

Dikkala, Sai Nishanth.

Download1191624462-MIT.pdf (1.813Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Constantinos Daskalakis.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In recent decades, the study of high-dimensional probability has taken centerstage within many research communities including Computer Science, Statistics and Machine Learning. Very often, due to the process according to which data is collected, the samples in a dataset have implicit correlations amongst them. Such correlations are commonly ignored as a first approximation when trying to analyze statistical and computational aspects of an inference task. In this thesis, we explore how to model such dependences between samples using structured high-dimensional distributions which result from imposing a Markovian property on the joint distribution of the data, namely Markov Random Fields (MRFs) and Markov chains. On MRFs, we explore a quantification for the amount of dependence and we strengthen previously known measure concentration results under a certain weak dependence condition on an MRF called the high-temperature regime. We then go on to apply our novel measure concentration bounds to improve the accuracy of samples computed according to a certain Markov Chain Monte Carlo procedure. We then show how to extend some classical results from statistical learning theory on PAC-learnability and uniform convergence to training data which is dependent under the high temperature condition. Then, we explore the task of regression on data which is dependent according to an MRF under a stronger amount of dependence than is allowed by the high-temperature condition. We then shift our focus to Markov chains where we explore the question of testing whether a certain trajectory we observe corresponds to a chain P or not. We discuss what is a reasonable formulation of this problem and provide a tester which works without observing a trajectory whose length contains multiplicative factors of the mixing or covering time of the chain P. We finally conclude with some broad directions for further research on statistical inference under data dependence.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 259-270).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127016

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses