MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

On optimization and scalability in deep learning

Author(s)
Kawaguchi, Kenji,Ph. D.Massachusetts Institute of Technology.
Thumbnail
Download1227519815-MIT.pdf (8.577Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Leslie P. Kaelbling.
Terms of use
MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Deep neural networks have achieved significant empirical success in many fields, including computer vision, machine learning, and artificial intelligence. Along with its empirical success, deep learning has been theoretically shown to be attractive in terms of its expressive power. That is, neural networks with one hidden layer can approximate any continuous function, and deeper neural networks can approximate functions of certain classes with fewer parameters. Expressivity theory states that there exist optimal parameter vectors for neural networks of certain sizes to approximate desired target functions. However, the expressivity theory does not ensure that we can find such an optimal vector efficiently during optimization of a neural network. Optimization is one of the key steps in deep learning because learning from data is achieved through optimization, i.e., the process of optimizing the parameters of a deep neural network to make the network consistent with the data.
 
This process typically requires nonconvex optimization, which is not scalable for high-dimensional problems in general. Indeed, in general, optimization of a neural network is not scalable without additional assumptions on its architecture. This thesis studies the non-convex optimization of various architectures of deep neural networks by focusing on some fundamental bottlenecks in the scalability, such as suboptimal local minima and saddle points. In particular, for deep neural networks, we present various guarantees for the values of local minima and critical points, as well as for points found by gradient descent. We prove that mild over-parameterization of practical degrees can ensure that gradient descent will find a global minimum for non-convex optimization of deep neural networks.
 
Furthermore, even without over-parameterization, we show, both theoretically and empirically, that increasing the number of parameters improves the values of critical points and local minima towards the global minimum value. We also prove theoretical guarantees on the values of local minima for residual neural networks. Moreover, this thesis presents a unified theory to analyze the critical points and local minima of various deep neural networks beyond these specific architectures. These results suggest that, whereas there is the issue of scalability in the theoretical worst-case and worst architectures, we can avoid the issue and scale well for large problems with various useful architectures in practice.
 
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020
 
Cataloged from student-submitted PDF of thesis.
 
Includes bibliographical references (pages 253-260).
 
Date issued
2020
URI
https://hdl.handle.net/1721.1/129255
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.