dc.contributor.author | HaoChen, Jeff | |
dc.contributor.author | Sra, Suvrit | |
dc.date.accessioned | 2021-11-03T15:30:23Z | |
dc.date.available | 2021-11-03T15:30:23Z | |
dc.date.issued | 2019-06 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/137223 | |
dc.description.abstract | A long-standing problem in optimization is proving that RANDOMSHUFFLE, the without-replacement version of SGD, converges faster than (the usual) with-replacement SGD. Building upon (Giirbiizbalaban et al., 2015b), we present the first non-asymptotic results for this problem, proving that after a reasonable number of epochs RANDOMSHUFFLE converges faster than SGD. Specifically, we prove that for strongly convex, second-order smooth functions, the iterates of RANDOMSHUFFLE converge to the optimal solution as O(1/T2 + n3/r3), where n is the number of components in the objective, and T is number of iterations. This result implies that after O (√n) epochs, RANDOMSHUFFLE is strictly better than SGD (which converges as O(1/T)). The key step toward showing this better dependence on T is the introduction of n into the bound; and as our analysis shows, in general a dependence on n is unavoidable without further changes. To understand how RANDOMSHUFFLE works in practice, we further explore two valuable settings: data sparsity and over-parameterization. For sparse data, RANDOMSHUFFLE has the rate Ö (/r2), aea strictly better than SGD. Under a setting closely related to over-parameterization, RANDOMSHUFFLE is shown to converge faster than SGD after any arbitrary number of iterations. Finally, we extend the analysis of RANDOMSHUFFLE to smooth convex and some non-convex functions. | en_US |
dc.description.sponsorship | NSF-CAREER (Award 1846088) | en_US |
dc.language.iso | en | |
dc.relation.isversionof | http://proceedings.mlr.press/v97/haochen19a.html | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Proceedings of Machine Learning Research | en_US |
dc.title | Random shuffling beats SGD after finite epochs | en_US |
dc.type | Article | en_US |
dc.identifier.citation | HaoChen, Jeff and Sra, Suvrit. 2019. "Random shuffling beats SGD after finite epochs." 36th International Conference on Machine Learning, ICML 2019, 2019-June. | |
dc.contributor.department | Massachusetts Institute of Technology. Institute for Data, Systems, and Society | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.relation.journal | 36th International Conference on Machine Learning, ICML 2019 | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-04-16T13:02:53Z | |
dspace.orderedauthors | Chen, JH; Sra, S | en_US |
dspace.date.submission | 2021-04-16T13:02:54Z | |
mit.journal.volume | 2019-June | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Publication Information Needed | en_US |