Reductions of ReLU neural networks to linear neural networks and their applications
Author(s)
Le, Thien
DownloadThesis PDF (447.7Kb)
Advisor
Jegelka, Stefanie
Terms of use
Metadata
Show full item recordAbstract
Deep neural networks are the main subject of interest in the study of theoretical deep learning, which aims to rigorously explain the incredible performance of these function classes in practice. Although a lot are understood about deep linear network (neural network with all linear activations), nonlinearities in the activation make it very challenging to extend existing techniques to more realistic types of neural networks. In this thesis, we describe reductions of ReLU neural networks to linear neural networks, under various general condition of network architectures, loss functions and datasets. When such conditions are met, one can adapt techniques used in the theory of linear neural networks to study ReLU neural networks. To this end, we provide two applications in which we put the reduction to use: the first characterizes an implicit regularization behavior of ReLU neural networks trained with gradient descent and the second characterizes their convergence under gradient-based algorithms.
Date issued
2022-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology