Optimization Theory and Machine Learning Practice: Mind the Gap
Author(s)
Zhang, Jingzhao
DownloadThesis PDF (2.718Mb)
Advisor
Sra, Suvrit
Jadbabaie, Ali
Terms of use
Metadata
Show full item recordAbstract
Machine learning is a technology developed for extracting predictive models from data so as to be able to generalize predictions to unobserved data. The process of selecting a good model based on a known dataset requires optimization. In particular, an optimization procedure generates a variable in a constraint set to minimize an objective. This process subsumes many machine learning pipelines including neural network training, which will be our main testing ground for theoretical analyses in this thesis.
Among different kinds of optimization algorithms, gradient methods have become the dominant algorithms in deep learning due to their scalability to high dimensions and their natural bound to backpropagation. However, despite the popularity of gradient-based algorithms, our understanding of such algorithms in a machine learning context from a theory perspective seems far from sufficient. On one hand, within the current theory framework, most upper and lower bounds are closed, and the theory problems seem solved. On the other hand, the theoretical analyses hardly generate empirically faster algorithms than those found by practitioners. In this thesis, we review the theoretical analyses of gradient methods, and point out the discrepancy between theory and practice. We then provide an explanation for why the mismatch happens and propose some initial solutions by developing theoretical analyses driven by empirical observations.
Date issued
2022-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology