When cyclic coordinate descent outperforms randomized coordinate descent
Author(s)Gurbuzbalaban, Mert; Ozdaglar, Asuman E; Parrilo, Pablo A.; Vanli, Nuri Denizcan
MetadataShow full item record
The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
DepartmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systems; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Advances in Neural Information Processing Systems 30 (NIPS 2017)
Neural Information Processing Systems Foundation, Inc.
Gürbüzbalaban, Mert, Asuman Ozdaglar, Pablo A. Parrilo and N. Denizcan Vanli. "When cyclic coordinate descent outperforms randomized coordinate descent." Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, Dec. 4-9 2017.
Final published version