MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Computational and statistical challenges in high dimensional statistical models

Author(s)
Zadik, Ilias
Thumbnail
Download1138021665-MIT.pdf (2.717Mb)
Other Contributors
Massachusetts Institute of Technology. Operations Research Center.
Advisor
David Gamarnik.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
This thesis focuses on two long-studied high-dimensional statistical models, namely (1) the high-dimensional linear regression (HDLR) model, where the goal is to recover a hidden vector of coefficients from noisy linear observations, and (2) the planted clique (PC) model, where the goal is to recover a hidden community structure from a much larger observed network. The following results are established. First, under assumptions, we identify the exact statistical limit of the model, that is the minimum signal strength allowing a statistically accurate inference of the hidden vector. We couple this result with an all-or-nothing information theoretic (IT) phase transition. We prove that above the statistical limit, it is IT possible to almost-perfectly recover the hidden vector, while below the statistical limit, it is IT impossible to achieve non-trivial correlation with the hidden vector.
 
Second, we study the computational-statistical gap of the sparse HDLR model; The statistical limit of the model is significantly smaller than its apparent computational limit, which is the minimum signal strength required by known computationally-efficient methods to perform statistical inference. We propose an explanation of the gap by analyzing the Overlap Gap Property (OGP) for HDLR. The OGP is known to be linked with algorithmic hardness in the theory of average-case optimization. We prove that the OGP for HDLR appears, up-to-constants, simultaneously with the computational-statistical gap, suggesting the OGP is a fundamental source of algorithmic hardness for HDLR. Third, we focus on noiseless HDLR. Here we do not assume sparsity, but we make a certain rationality assumption on the coefficients. In this case, we propose a polynomial-time recovery method based on the Lenstra-Lenstra-Lóvasz lattice basis reduction algorithm.
 
We prove that the method obtains notable guarantees, as it recovers the hidden vector with using only one observation. Finally, we study the computational-statistical gap of the PC model. Similar to HDLR, we analyze the presence of OGP for the PC model. We provide strong (first-moment) evidence that again the OGP coincides with the model's computational-statistical gap. For this reason, we conjecture that the OGP provides a fundamental algorithmic barrier for PC as well, and potentially in a generic sense for high-dimensional statistical tasks.
 
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
 
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019
 
Cataloged from student-submitted PDF version of thesis.
 
Includes bibliographical references (pages 289-301).
 
Date issued
2019
URI
https://hdl.handle.net/1721.1/123708
Department
Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management
Publisher
Massachusetts Institute of Technology
Keywords
Operations Research Center.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.