Statistical inference from dependent data : networks and Markov chains
Author(s)Dikkala, Sai Nishanth.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
In recent decades, the study of high-dimensional probability has taken centerstage within many research communities including Computer Science, Statistics and Machine Learning. Very often, due to the process according to which data is collected, the samples in a dataset have implicit correlations amongst them. Such correlations are commonly ignored as a first approximation when trying to analyze statistical and computational aspects of an inference task. In this thesis, we explore how to model such dependences between samples using structured high-dimensional distributions which result from imposing a Markovian property on the joint distribution of the data, namely Markov Random Fields (MRFs) and Markov chains. On MRFs, we explore a quantification for the amount of dependence and we strengthen previously known measure concentration results under a certain weak dependence condition on an MRF called the high-temperature regime. We then go on to apply our novel measure concentration bounds to improve the accuracy of samples computed according to a certain Markov Chain Monte Carlo procedure. We then show how to extend some classical results from statistical learning theory on PAC-learnability and uniform convergence to training data which is dependent under the high temperature condition. Then, we explore the task of regression on data which is dependent according to an MRF under a stronger amount of dependence than is allowed by the high-temperature condition. We then shift our focus to Markov chains where we explore the question of testing whether a certain trajectory we observe corresponds to a chain P or not. We discuss what is a reasonable formulation of this problem and provide a tester which works without observing a trajectory whose length contains multiplicative factors of the mixing or covering time of the chain P. We finally conclude with some broad directions for further research on statistical inference under data dependence.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 259-270).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.