Show simple item record

dc.contributor.advisorSra, Suvrit
dc.contributor.advisorJadbabaie, Ali
dc.contributor.authorYun, Chulhee
dc.date.accessioned2022-02-07T15:15:54Z
dc.date.available2022-02-07T15:15:54Z
dc.date.issued2021-09
dc.date.submitted2021-09-21T19:30:42.053Z
dc.identifier.urihttps://hdl.handle.net/1721.1/139962
dc.description.abstractThe success of deep learning has shown impressive empirical breakthroughs, but many theoretical questions still remain unsolved. For example, despite the nonconvexity of training objectives, deep neural networks can be reliably trained to fully memorize training datasets, yet perform very well on unseen test data. Although progress has been made over the recent years, the gap between theory and practice remains wide open; this thesis takes a step towards closing this gap. First, we discuss the optimization landscape of neural networks. We prove the existence of spurious local minima for general datasets and activation functions, which suggests that the convergence of optimization methods on neural networks cannot be explained solely via the training objectives. Next, we establish tight bounds on memorization capacity of ReLU networks. We present results showing that width-$\Theta(\sqrt{n})$ ReLU networks can memorize arbitrary $n$ data points, which brings down the existing width requirement $n$ to a much more realistic number. The third part discusses implicit bias in training neural networks, which is an area to understand generalization via the optimization trajectory. Through a unified analysis using a tensor representation of neural networks, we show how different architectures in linear neural networks lead to different global minima in overparameterized regimes. Lastly, we address a major theory-practice gap in stochastic finite-sum optimization: practical algorithms shuffle and iterate through component indices, while most theoretical analyses assume uniformly sampling the indices. To close this gap, we develop tight convergence rates of shuffling-based SGD faster than the rate of its uniform-sampling counterpart.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleOptimization for Deep Learning: Bridging the Theory-Practice Gap
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcidhttps://orcid.org/0000-0002-3504-4690
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record