Musings on Deep Learning: Properties of SGD
Author(s)
Zhang, Chiyuan; Liao, Qianli; Rakhlin, Alexander; Sridharan, Karthik; Miranda, Brando; Golowich, Noah; Poggio, Tomaso; ... Show more Show less
DownloadCBMM-Memo-067.pdf (5.875Mb)
Terms of use
Metadata
Show full item recordAbstract
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.
Date issued
2017-04-04Publisher
Center for Brains, Minds and Machines (CBMM)
Series/Report no.
CBMM Memo Series;067
Collections
The following license files are associated with this item: