Search

Now showing items 1-3 of 3

Musings on Deep Learning: Properties of SGD

Zhang, Chiyuan; Liao, Qianli; Rakhlin, Alexander; Sridharan, Karthik; Miranda, Brando; e.a. (Center for Brains, Minds and Machines (CBMM), 2017-04-04)

[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in ...

Theory of Deep Learning III: explaining the non-overfitting puzzle

Poggio, Tomaso; Kawaguchi, Kenji; Liao, Qianli; Miranda, Brando; Rosasco, Lorenzo; e.a. (arXiv, 2017-12-30)

THIS MEMO IS REPLACED BY CBMM MEMO 90 A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly ...

Theory of Deep Learning IIb: Optimization Properties of SGD

Zhang, Chiyuan; Liao, Qianli; Rakhlin, Alexander; Miranda, Brando; Golowich, Noah; e.a. (Center for Brains, Minds and Machines (CBMM), 2017-12-27)

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence ...

Search

Filters

Musings on Deep Learning: Properties of SGD

Theory of Deep Learning III: explaining the non-overfitting puzzle

Theory of Deep Learning IIb: Optimization Properties of SGD