Principled approaches to robust machine learning and beyond

Li, Jerry Zheng

Author(s)

Li, Jerry Zheng

DownloadFull printable version (2.978Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Ankur Moitra.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

As we apply machine learning to more and more important tasks, it becomes increasingly important that these algorithms are robust to systematic, or worse, malicious, noise. Despite considerable interest, no efficient algorithms were known to be robust to such noise in high dimensional settings for some of the most fundamental statistical tasks for over sixty years of research. In this thesis we devise two novel, but similarly inspired, algorithmic paradigms for estimation in high dimensions in the presence of a small number of adversarially added data points. Both algorithms are the first efficient algorithms which achieve (nearly) optimal error bounds for a number fundamental statistical tasks such as mean estimation and covariance estimation. The goal of this thesis is to present these two frameworks in a clean and unified manner. We show that these insights also have applications for other problems in learning theory. Specifically, we show that these algorithms can be combined with the powerful Sum-of-Squares hierarchy to yield improvements for clustering high dimensional Gaussian mixture models, the first such improvement in over fifteen years of research. Going full circle, we show that Sum-of-Squares also can be used to improve error rates for robust mean estimation. Not only are these algorithms of interest theoretically, but we demonstrate empirically that we can use these insights in practice to uncover patterns in high dimensional data that were previously masked by noise. Based on our algorithms, we give new implementations for robust PCA, new defenses for data poisoning attacks for stochastic optimization, and new defenses for watermarking attacks on deep nets. In all of these tasks, we demonstrate on both synthetic and real data sets that our performance is substantially better than the state-of-the-art, often able to detect most to all corruptions when previous methods could not reliably detect any.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 305-320).

Date issued

2018

URI

http://hdl.handle.net/1721.1/120382

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses