MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Unified Approach to Controlling Implicit Regularization Using Mirror Descent

Author(s)
Sun, Haoyuan
Thumbnail
DownloadThesis PDF (1.389Mb)
Advisor
Jadbabaie, Ali
Azizan, Navid
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Inspired by the remarkable performance of deep neural networks, understanding the generalization performance of overparameterized models and the effect of optimization algorithms on it has become an increasingly popular question. In particular, there has been substantial effort to characterize the solutions preferred by the optimization algorithms, such as gradient descent (GD), something referred to as implicit regularization. In particular, it has been argued that GD tends to induce an implicit $\ell_2$-norm regularization in regression and classification problems. Despite significant progress in this space, the implicit bias of various algorithms are either specific to a particular geometry or only exist for a particular class of learning problems, and there is a lack of a general approach for controlling the implicit regularization. To this end, we present a unified approach via mirror descent (MD), which is an important generalization of GD, to control implicit regularization in both regression and classification settings. In particular, we show that MD with a general class of homogeneous potential function converges in direction to a generalized maximum-margin solution for linear classifications problems, thereby answering an open question in the classification setting. Additionally, we show that under suitable conditions, MD can be efficiently implemented with minimal overhead compared to GD and enjoys fast convergence to the maximum-margin solution induced by its implicit bias. Using comprehensive experiments with both linear and deep neural network models, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn lead to different generalization performances.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151464
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.