Show simple item record

dc.contributor.authorGalanti, Tomer
dc.contributor.authorPoggio, Tomaso
dc.date.accessioned2022-03-28T20:03:08Z
dc.date.available2022-03-28T20:03:08Z
dc.date.issued2022-03-28
dc.identifier.urihttps://hdl.handle.net/1721.1/141380
dc.description.abstractWe analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within the network. Furthermore, we show, both theoretically and empirically, that when training a neural network using Stochastic Gradient Descent (SGD) with a small batch size, the resulting weight matrices are expected to be of small rank. Our analysis relies on a minimal set of assumptions and the neural networks may include convolutional layers, residual connections, as well as batch normalization layers.en_US
dc.description.sponsorshipThis work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216.en_US
dc.publisherCenter for Brains, Minds and Machines (CBMM)en_US
dc.relation.ispartofseriesCBMM Memo;134
dc.titleSGD Noise and Implicit Low-Rank Bias in Deep Neural Networksen_US
dc.typeArticleen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record