Optimization Theory and Machine Learning Practice: Mind the Gap

Zhang, Jingzhao

Author(s)

Zhang, Jingzhao

DownloadThesis PDF (2.718Mb)

Advisor

Sra, Suvrit

Jadbabaie, Ali

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Machine learning is a technology developed for extracting predictive models from data so as to be able to generalize predictions to unobserved data. The process of selecting a good model based on a known dataset requires optimization. In particular, an optimization procedure generates a variable in a constraint set to minimize an objective. This process subsumes many machine learning pipelines including neural network training, which will be our main testing ground for theoretical analyses in this thesis. Among different kinds of optimization algorithms, gradient methods have become the dominant algorithms in deep learning due to their scalability to high dimensions and their natural bound to backpropagation. However, despite the popularity of gradient-based algorithms, our understanding of such algorithms in a machine learning context from a theory perspective seems far from sufficient. On one hand, within the current theory framework, most upper and lower bounds are closed, and the theory problems seem solved. On the other hand, the theoretical analyses hardly generate empirically faster algorithms than those found by practitioners. In this thesis, we review the theoretical analyses of gradient methods, and point out the discrepancy between theory and practice. We then provide an explanation for why the mismatch happens and propose some initial solutions by developing theoretical analyses driven by empirical observations.

Date issued

2022-02

URI

https://hdl.handle.net/1721.1/143318

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses