Understanding and Overcoming Optimization Barriers in Non-convex and Non-smooth Machine Learning

Gatmiry, Khashayar

Author(s)

Gatmiry, Khashayar

DownloadThesis PDF (8.865Mb)

Advisor

Jegelka, Stefanie

Kelner, Jonathan A.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

At their core, our machine learning systems are trained by solving an optimization problem, where the goal is to minimize a predefined objective function by adjusting model parameters based on the data. Despite the wealth of structure and prior knowledge present in the data and feedback, our training methods remain relatively simple and independent of this structure. In spite of, or perhaps because of, this simplicity, these methods are often lacking in theoretical guarantees. To design machine learning algorithms that are less data-hungry while ensuring theoretical guarantees on both computational efficiency and output validity, it is essential to better understand and leverage the rich structure within the learning setup and the data distribution, e.g. by altering the geometry of the solution space or adjusting the objective function to induce a more effective learning procedure. This approach moves beyond classical algorithm design, which focuses primarily on handling worst-case instances. This thesis investigates the optimization landscape of central learning problems and develops geometric and analytic schemes adapted to their structure, leading to algorithms with superior computational and statistical performance. In addition, it seeks to advance our mathematical understanding of the principles underlying the success of deep learning.

Date issued

2025-09

URI

https://hdl.handle.net/1721.1/164603

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses