dc.contributor.author | Zhang, Chiyuan | |
dc.contributor.author | Liao, Qianli | |
dc.contributor.author | Rakhlin, Alexander | |
dc.contributor.author | Sridharan, Karthik | |
dc.contributor.author | Miranda, Brando | |
dc.contributor.author | Golowich, Noah | |
dc.contributor.author | Poggio, Tomaso | |
dc.date.accessioned | 2017-04-04T21:32:29Z | |
dc.date.available | 2017-04-04T21:32:29Z | |
dc.date.issued | 2017-04-04 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/107841 | |
dc.description.abstract | [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization. | en_US |
dc.description.sponsorship | This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385. | en_US |
dc.language.iso | en_US | en_US |
dc.publisher | Center for Brains, Minds and Machines (CBMM) | en_US |
dc.relation.ispartofseries | CBMM Memo Series;067 | |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.title | Musings on Deep Learning: Properties of SGD | en_US |
dc.type | Technical Report | en_US |
dc.type | Working Paper | en_US |
dc.type | Other | en_US |
dc.audience.educationlevel | | |