MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

An Empirical and Theoretical Analysis of the Role of Depth in Convolutional Neural Networks

Author(s)
Nichani, Eshaan
Thumbnail
DownloadThesis PDF (1.610Mb)
Advisor
Uhler, Caroline
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
While over-parameterized neural networks are capable of perfectly fitting (interpolating) training data, these networks often perform well on test data, thereby contradicting classical learning theory. Recent work provided an explanation for this phenomenon by introducing the double descent curve, showing that increasing model capacity past the interpolation threshold can lead to a decrease in test error. In line with this, it was recently shown empirically and theoretically that increasing neural network capacity through width leads to double descent. In this thesis, we analyze the effect of increasing depth on test performance. In contrast to what is observed for increasing width, we demonstrate through a variety of classification experiments on CIFAR10 and ImageNet32 using fully-convolutional nets, ResNets and the convolutional neural tangent kernel (CNTK) that test performance is U-shaped and in fact worsens beyond a critical depth. To better understand this phenomenon, we conduct a theoretical analysis on the impact of depth on generalization in linear convolutional networks of infinite width. In particular, we derive the feature map for the linear CNTK for arbitrary depths and identify the depth which minimizes the bias and variance terms of the excess risk. The findings of this thesis imply that increasing depth for interpolating convolutional networks can in fact lead to worse generalization.
Date issued
2021-06
URI
https://hdl.handle.net/1721.1/139174
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.