Optimization and Generalization of Minimax Algorithms
Author(s)
Pattathil, Sarath
DownloadThesis PDF (1.306Mb)
Advisor
Ozdaglar, Asuman
Terms of use
Metadata
Show full item recordAbstract
This thesis explores minimax formulations of machine learning and multi-agent learning problems, focusing on algorithmic optimization and generalization performance. The first part of the thesis delves into the smooth convex-concave minimax problem, providing a unified analysis of widely used algorithms such as Extra-Gradient (EG) and Optimistic Gradient Descent Ascent (OGDA), whose convergence behavior was not systematically understood. We derive convergence rates for these algorithms in the convex-concave setting. We show that these algorithms work effectively due to their approximation of the Proximal Point (PP) method, which converges to the solution at a fast rate, but is impractical to implement. In the next chapter, we expand our study to nonconvex-nonconcave problems. These problems are generally challenging to solve, as a solution may not be well defined, or even if a solution exists, its computation may not be tractable. We identify a class of nonconvex-nonconcave problems that do have well defined and computationally tractable solutions. Leveraging the concepts developed in the first chapter, we design algorithms to efficiently tackle this special class of nonconvex-nonconcave problems. The final part of this thesis addresses the issue of generalization. In many cases, such as GANs and adversarial training, the objective function for finding the saddle point can be written as an expected value over the data distribution. However, since we often do not have direct access to this distribution, we solve the empirical problem instead, which involves averaging over the available dataset. The final chapter aims to evaluate the quality of solutions to the empirical problem compared to the original population problem. Existing metrics like the primal risk, which are used to assess generalization in the minimax setting are found to be inadequate in capturing the generalization of minimax learners. This prompts the proposal of a new metric, the primal gap, which overcomes these limitations. This novel metric is then utilized to investigate the generalization performance of popular algorithms like Gradient Descent Ascent (GDA) and Gradient Descent-Max (GDMax).
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology