Show simple item record

dc.contributor.authorXu, Mengjia
dc.contributor.authorGalanti, Tomer
dc.contributor.authorRangamani, Akshay
dc.contributor.authorRosasco, Lorenzo
dc.contributor.authorPoggio, Tomaso
dc.date.accessioned2023-12-21T20:32:01Z
dc.date.available2023-12-21T20:32:01Z
dc.date.issued2023-12-21
dc.identifier.urihttps://hdl.handle.net/1721.1/153227
dc.description.abstractIt was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked SGD noise with the low-rank bias induced by the SGD updates associated with small minibatch sizes [1]. In this paper, we provide an empirical and theoretical analysis of the convergence of SGD vs GD, first for deep RELU networks and then for the case of linear regression, where sharper estimates can be obtained and which is of independent interest. In the linear case, we prove that the components of the matrix W corresponding to the null space of the data matrix X converges to zero for both SGD and GD, provided the regularization term is non-zero (in the case of square loss; for exponential loss the result holds independently of regularization). The convergence rate, however, is exponential for SGD, and linear for GD. Thus SGD has a much stronger bias than GD towards solutions for weight matrices W with high fluctuations and low rank, provided the initialization is from a random matrix (but not if W is initialized as a zero matrix). Thus SGD under exponential loss, or under the square loss with non-zero regularization, shows the coupled phenomenon of low rank and asymptotic noise.en_US
dc.description.sponsorshipThis material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.en_US
dc.relation.ispartofseriesCBMM Memo;144
dc.titleThe Janus effects of SGD vs GD: high noise and low ranken_US
dc.typeArticleen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record