| dc.description.abstract | The success of modern data science is largely driven by access to large-scale, high dimensional data. Much of classical machine learning has been developed under the assumption that this data is generated independently from some distribution. However, this assumption is often violated when data exhibit complex dependencies across a spatial or temporal domain, or due to social interactions. In this thesis, our goal is to design and analyze methods that address these dependencies for performing three fundamental estimation tasks: unsupervised learning, supervised learning and counterfactual inference. In supervised learning, we observe a sequence of unlabeled examples and our goal is to infer some structural property from the distribution they came from. The presence of dependencies could severely complicate this question. Our results in this direction encompass both fully observable as well as latent variable models. For fully observable models, we use the celebrated Ising model to describe the dependencies. Assuming we have access to a single sample from some Ising model, which captures a variety of real-world scenarios, we design and analyze polynomial time algorithms for recovering the matrix corresponding to the network structure of the model. We then leverage these techniques to obtain improved guarantees for estimating Ising models in Total Variation (TV) distance from multiple samples. For latent variable models, we focus on the case where the structure is a tree and we get samples from the leaves, which is a common scenario in phylogenetics. Assuming the model is Gaussian, we analyze the behavior of the Expectation-Maximization (EM) algorithm, a popular heuristic for latent variable models. We show that for trees with a single latent node, EM converges to the true model and for general tree topologies, the only stationary point in the interior of the domain is the true model. We then shift our focus to discrete models and study latent tree Ising models, for which we provide polynomial time algorithms for learning the distribution of leaves in TV distance. In supervised learning, we observe a sequence of feature-label pairs and our task is to learn the predictive relationship between the features and the labels. Here, this relationship could be confounded by the presence of dependencies among labels. We formulate this question as a regression problem, where the labels of the units follow the joint distribution of an Ising model with an unknown strength parameter and external fields that are determined by the regression function. We characterize the minimax optimal rate of estimation for the various parameters and provide an efficient algorithm that achieves it. Interestingly, it might not be possible to estimate all the parameters in some cases. In counterfactual inference, we focus on the design of network experiments, where the treatment of a unit could affect the outcome of a neighboring unit in an underlying graph. Our goal is to estimate a general causal effect that can be defined as the average difference in outcomes for a unit under two different interventions. For an arbitrary such effect, we propose an experimental design, called the conflict graph design. For an unbiased estimator of that effect, we prove bounds on its variance that yield the best known rates of estimation for various effects studied in the literature, such as the average direct effect and the total effect, but also provide estimation rates for effects that have received less attention from the perspective of experimental design. | |