On optimization and scalability in deep learning

Kawaguchi, Kenji,Ph. D.Massachusetts Institute of Technology.

dc.contributor.advisor	Leslie P. Kaelbling.	en_US
dc.contributor.author	Kawaguchi, Kenji,Ph. D.Massachusetts Institute of Technology.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2021-01-06T19:35:52Z
dc.date.available	2021-01-06T19:35:52Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/129255
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020	en_US
dc.description	Cataloged from student-submitted PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 253-260).	en_US
dc.description.abstract	Deep neural networks have achieved significant empirical success in many fields, including computer vision, machine learning, and artificial intelligence. Along with its empirical success, deep learning has been theoretically shown to be attractive in terms of its expressive power. That is, neural networks with one hidden layer can approximate any continuous function, and deeper neural networks can approximate functions of certain classes with fewer parameters. Expressivity theory states that there exist optimal parameter vectors for neural networks of certain sizes to approximate desired target functions. However, the expressivity theory does not ensure that we can find such an optimal vector efficiently during optimization of a neural network. Optimization is one of the key steps in deep learning because learning from data is achieved through optimization, i.e., the process of optimizing the parameters of a deep neural network to make the network consistent with the data.	en_US
dc.description.abstract	This process typically requires nonconvex optimization, which is not scalable for high-dimensional problems in general. Indeed, in general, optimization of a neural network is not scalable without additional assumptions on its architecture. This thesis studies the non-convex optimization of various architectures of deep neural networks by focusing on some fundamental bottlenecks in the scalability, such as suboptimal local minima and saddle points. In particular, for deep neural networks, we present various guarantees for the values of local minima and critical points, as well as for points found by gradient descent. We prove that mild over-parameterization of practical degrees can ensure that gradient descent will find a global minimum for non-convex optimization of deep neural networks.	en_US
dc.description.abstract	Furthermore, even without over-parameterization, we show, both theoretically and empirically, that increasing the number of parameters improves the values of critical points and local minima towards the global minimum value. We also prove theoretical guarantees on the values of local minima for residual neural networks. Moreover, this thesis presents a unified theory to analyze the critical points and local minima of various deep neural networks beyond these specific architectures. These results suggest that, whereas there is the issue of scalability in the theoretical worst-case and worst architectures, we can avoid the issue and scale well for large problems with various useful architectures in practice.	en_US
dc.description.statementofresponsibility	by Kenji Kawaguchi.	en_US
dc.format.extent	260 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	On optimization and scalability in deep learning	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1227519815	en_US
dc.description.collection	Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2021-01-06T19:35:51Z	en_US
mit.thesis.degree	Doctoral	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1227519815-MIT.pdf
Size:: 8.577Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record