Robust accelerated gradient methods for machine learning
Author(s)
Fallah, Alireza.
Download1126661838-MIT.pdf (6.699Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Asuman Ozdaglar.
Terms of use
Metadata
Show full item recordAbstract
In this thesis, we study the problem of minimizing a smooth and strongly convex function, which arises in different areas, including regularized regression problems in machine learning. To solve this optimization problem, we consider using first order methods which are popular due to their scalability with large data sets, and we study the case that the exact gradient information is not available. In this setting, a naive implementation of classical first order algorithms need not converge and even accumulate noise. This motivates consideration of robustness of algorithms to noise as another metric in designing fast algorithms. To address this problem, we first propose a definition for the robustness of an algorithm in terms of the asymptotic expected suboptimality of its iterate sequence to input noise power. We focus on Gradient Descent and Accelerated Gradient methods and develop a framework based on a dynamical system representation of these algorithms to characterize their convergence rate and robustness to noise using tools from control theory and optimization. We provide explicit expressions for the convergence rate and robustness of both algorithms for the quadratic case, and also derive tractable and tight upper bounds for general smooth and strongly convex functions. We also develop a computational framework for choosing parameters of these algorithms to achieve a particular trade-off between robustness and rate. As a second contribution, we consider algorithms that can reach optimality (obtaining perfect robustness). The past literature provided lower bounds on the rate of decay of suboptimality in term of initial distance to optimality (in the deterministic case) and error due to gradient noise (in the stochastic case). We design a novel multistage and accelerated universally optimal algorithm that can achieve both of these lower bounds simultaneously without knowledge of initial optimality gap or noise characterization. We finally illustrate the behavior of our algorithm through numerical experiments.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 91-95).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.