Robust accelerated gradient methods for machine learning

Fallah, Alireza.

Author(s)

Fallah, Alireza.

Download1126661838-MIT.pdf (6.699Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Asuman Ozdaglar.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In this thesis, we study the problem of minimizing a smooth and strongly convex function, which arises in different areas, including regularized regression problems in machine learning. To solve this optimization problem, we consider using first order methods which are popular due to their scalability with large data sets, and we study the case that the exact gradient information is not available. In this setting, a naive implementation of classical first order algorithms need not converge and even accumulate noise. This motivates consideration of robustness of algorithms to noise as another metric in designing fast algorithms. To address this problem, we first propose a definition for the robustness of an algorithm in terms of the asymptotic expected suboptimality of its iterate sequence to input noise power.

We focus on Gradient Descent and Accelerated Gradient methods and develop a framework based on a dynamical system representation of these algorithms to characterize their convergence rate and robustness to noise using tools from control theory and optimization. We provide explicit expressions for the convergence rate and robustness of both algorithms for the quadratic case, and also derive tractable and tight upper bounds for general smooth and strongly convex functions. We also develop a computational framework for choosing parameters of these algorithms to achieve a particular trade-off between robustness and rate. As a second contribution, we consider algorithms that can reach optimality (obtaining perfect robustness). The past literature provided lower bounds on the rate of decay of suboptimality in term of initial distance to optimality (in the deterministic case) and error due to gradient noise (in the stochastic case).

We design a novel multistage and accelerated universally optimal algorithm that can achieve both of these lower bounds simultaneously without knowledge of initial optimality gap or noise characterization. We finally illustrate the behavior of our algorithm through numerical experiments.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 91-95).

Date issued

2019

URI

https://hdl.handle.net/1721.1/122881

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses