Show simple item record

dc.contributor.authorAltschuler, Jason
dc.contributor.authorParrilo, Pablo
dc.date.accessioned2025-01-29T21:42:03Z
dc.date.available2025-01-29T21:42:03Z
dc.date.issued2024-12-13
dc.identifier.issn0004-5411
dc.identifier.urihttps://hdl.handle.net/1721.1/158132
dc.description.abstractCan we accelerate the convergence of gradient descent without changing the algorithmÐjust by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in � log� 2 ≈ � 0.7864 iterations, where � = 1 + √ 2 is the silver ratio and � is the condition number. This is intermediate between the textbook unaccelerated rate � and the accelerated rate � 1/2 due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous partially accelerated rate � − log� 2 ≈ � −0.7864 . We conjecture and provide partial evidence that these rates are optimal among all stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period � log� 2 . This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime). The core algorithmic intuition is hedging between individually suboptimal strategiesÐshort steps and long stepsÐsince bad cases for the former are good cases for the latter, and vice versa. Properly combining these stepsizes yields faster convergence due to the misalignment of worst-case functions. The key challenge in proving this speedup is enforcing long-range consistency conditions along the algorithm’s trajectory. We do this by developing a technique that recursively glues constraints from diferent portions of the trajectory, thus removing a key stumbling block in previous analyses of optimization algorithms. More broadly, we believe that the concepts of hedging and multi-step descent have the potential to be powerful algorithmic paradigms in a variety of contexts in optimization and beyond. This paper publishes and extends the irst author’s 2018 Master’s Thesis (advised by the second author)Ðwhich established for the irst time that judiciously choosing stepsizes can enable acceleration in convex optimization. Prior to this thesis, the only such result was for the special case of quadratic optimization, due to Young in 1953.en_US
dc.publisherACMen_US
dc.relation.isversionofhttps://doi.org/10.1145/3708502en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleAcceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Scheduleen_US
dc.typeArticleen_US
dc.identifier.citationAltschuler, Jason and Parrilo, Pablo. 2024. "Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule." Journal of the ACM.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalJournal of the ACMen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-01-01T08:52:06Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-01-01T08:52:06Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record