Rethinking Algorithm Design for Modern Challenges in Data Science

Chen, Sitan

Author(s)

Chen, Sitan

DownloadThesis PDF (4.745Mb)

Advisor

Moitra, Ankur

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Heuristics centered around gradient descent and function approximation by neural networks have proven wildly successful for a number of fundamental data science tasks, so much so that it is easy to lose sight of how far we are from understanding why they work so well. Can we design learning algorithms with rigorous guarantees to either match, outperform, or augment these heuristics? In the first part of this thesis, we present new provable algorithms for learning rich function classes like neural networks in natural learning settings where gradient-based methods provably fail. Our algorithms are based on a new general recipe that we call filtered PCA for dimensionality reduction in multi-index models. Asking for rigorous guarantees not only helps uncover general mechanisms that make learning tractable, but also lets us be certain that our algorithms are resilient to the demands of modern data. In the second part of this thesis, we study challenging settings where even a constant fraction of data may have been corrupted and develop new iterative reweighing schemes for mitigating corruptions in the context of distribution estimation, linear regression, and online learning. A distinctive feature of many of our results here is that they make minimal assumptions on the data-generating process. In certain situations however, data may be difficult to work with not because it has been corrupted, but because it comes from a number of heterogeneous sources. In the third part of this thesis, we give improved algorithms for two popular models of heterogeneity, mixtures of product distributions and mixtures of linear regressions, by developing novel ways of using Fourier approximation, the method of moments, and combinations thereof to extract latent structure in the data. In the final part of this thesis, we ask whether these and related ideas in data science can help shed light on problems in the sciences. We give two such applications, one to rigorously pinning down the much-debated diffraction limit in classical optics, and the other to showing memory-sample tradeoffs for quantum state certification.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/139922

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses