Reductions of ReLU neural networks to linear neural networks and their applications

Le, Thien

Author(s)

Le, Thien

DownloadThesis PDF (447.7Kb)

Advisor

Jegelka, Stefanie

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Deep neural networks are the main subject of interest in the study of theoretical deep learning, which aims to rigorously explain the incredible performance of these function classes in practice. Although a lot are understood about deep linear network (neural network with all linear activations), nonlinearities in the activation make it very challenging to extend existing techniques to more realistic types of neural networks. In this thesis, we describe reductions of ReLU neural networks to linear neural networks, under various general condition of network architectures, loss functions and datasets. When such conditions are met, one can adapt techniques used in the theory of linear neural networks to study ReLU neural networks. To this end, we provide two applications in which we put the reduction to use: the first characterizes an implicit regularization behavior of ReLU neural networks trained with gradient descent and the second characterizes their convergence under gradient-based algorithms.

Date issued

2022-02

URI

https://hdl.handle.net/1721.1/143196

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses