Show simple item record

dc.contributor.authorHaoChen, Jeff
dc.contributor.authorSra, Suvrit
dc.date.accessioned2021-11-03T15:30:23Z
dc.date.available2021-11-03T15:30:23Z
dc.date.issued2019-06
dc.identifier.urihttps://hdl.handle.net/1721.1/137223
dc.description.abstractA long-standing problem in optimization is proving that RANDOMSHUFFLE, the without-replacement version of SGD, converges faster than (the usual) with-replacement SGD. Building upon (Giirbiizbalaban et al., 2015b), we present the first non-asymptotic results for this problem, proving that after a reasonable number of epochs RANDOMSHUFFLE converges faster than SGD. Specifically, we prove that for strongly convex, second-order smooth functions, the iterates of RANDOMSHUFFLE converge to the optimal solution as O(1/T2 + n3/r3), where n is the number of components in the objective, and T is number of iterations. This result implies that after O (√n) epochs, RANDOMSHUFFLE is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis shows, in general a dependence on n is unavoidable without further changes. To understand how RANDOMSHUFFLE works in practice, we further explore two valuable settings: data sparsity and over-parameterization. For sparse data, RANDOMSHUFFLE has the rate Ö (/r2), aea strictly better than SGD. Under a setting closely related to over-parameterization, RANDOMSHUFFLE is shown to converge faster than SGD after any arbitrary number of iterations. Finally, we extend the analysis of RANDOMSHUFFLE to smooth convex and some non-convex functions.en_US
dc.description.sponsorshipNSF-CAREER (Award 1846088)en_US
dc.language.isoen
dc.relation.isversionofhttp://proceedings.mlr.press/v97/haochen19a.htmlen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceProceedings of Machine Learning Researchen_US
dc.titleRandom shuffling beats SGD after finite epochsen_US
dc.typeArticleen_US
dc.identifier.citationHaoChen, Jeff and Sra, Suvrit. 2019. "Random shuffling beats SGD after finite epochs." 36th International Conference on Machine Learning, ICML 2019, 2019-June.
dc.contributor.departmentMassachusetts Institute of Technology. Institute for Data, Systems, and Societyen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journal36th International Conference on Machine Learning, ICML 2019en_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-04-16T13:02:53Z
dspace.orderedauthorsChen, JH; Sra, Sen_US
dspace.date.submission2021-04-16T13:02:54Z
mit.journal.volume2019-Juneen_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record