Show simple item record

dc.contributor.authorZhang, Chiyuan
dc.contributor.authorLiao, Qianli
dc.contributor.authorRakhlin, Alexander
dc.contributor.authorSridharan, Karthik
dc.contributor.authorMiranda, Brando
dc.contributor.authorGolowich, Noah
dc.contributor.authorPoggio, Tomaso
dc.date.accessioned2017-04-04T21:32:29Z
dc.date.available2017-04-04T21:32:29Z
dc.date.issued2017-04-04
dc.identifier.urihttp://hdl.handle.net/1721.1/107841
dc.description.abstract[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.en_US
dc.description.sponsorshipThis work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385.en_US
dc.language.isoen_USen_US
dc.publisherCenter for Brains, Minds and Machines (CBMM)en_US
dc.relation.ispartofseriesCBMM Memo Series;067
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.titleMusings on Deep Learning: Properties of SGDen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US
dc.typeOtheren_US
dc.audience.educationlevel


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record