Show simple item record

dc.contributor.authorBanburski, Andrzej
dc.contributor.authorDe La Torre, Fernanda
dc.contributor.authorPlant, Nishka
dc.contributor.authorShastri, Ishana
dc.contributor.authorPoggio, Tomaso
dc.date.accessioned2021-02-11T16:59:01Z
dc.date.available2021-02-11T16:59:01Z
dc.date.issued2021-02-09
dc.identifier.urihttps://hdl.handle.net/1721.1/129744
dc.description.abstractRecent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of “high capacity” features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay.en_US
dc.description.sponsorshipThis material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.en_US
dc.publisherCenter for Brains, Minds and Machines (CBMM)en_US
dc.relation.ispartofseriesCBMM Memo;115
dc.titleCross-validation Stability of Deep Networksen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US
dc.typeOtheren_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record