A perturbative analysis of stochastic descent
Author(s)
Tenka, Samuel C.
Download1227278188-MIT.pdf (3.035Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Joshua B. Tenenbaum.
Terms of use
Metadata
Show full item recordAbstract
We analyze stochastic gradient descent (SGD) at small learning rates. Unlike prior analyses based on stochastic differential equations, our theory models discrete time and hence non-Gaussian noise. We illustrate our theory by discussing four of its corollaries: we (A) generalize the Akaike information criterion (AIC) to a smooth estimator of overfitting, hence enabling gradient-based model selection; (B) show how non-stochastic GD with a modified loss function may emulate SGD; (C) prove that gradient noise systematically pushes SGD toward flatter minima; and (D) characterize when and why flat minima overfit less than other minima.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references.
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.