Transport and Beyond: Efficient Optimization over Probability Distributions

Altschuler, Jason M.

Author(s)

Altschuler, Jason M.

DownloadThesis PDF (5.033Mb)

Additional downloads

Supplementary file (116.1Kb)

Advisor

Parrilo, Pablo A.

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The core of classical optimization focuses on the setting where decision variables are vectors in Rⁿ. However, modern applications throughout machine learning, applied mathematics, and engineering demand high-dimensional optimization problems where decision variables are probability distributions. Can such optimization problems be solved efficiently? This thesis presents two interrelated lines of work in this direction through the common thread of Optimal Transport. A unifying theme is the optimization of joint probability distributions with constrained marginals. Part I of this thesis considers Optimal Transport and other optimization problems over joint distributions with two constrained marginals. Such tasks are fundamental in alignment problems, matrix problems, graph problems, and more. Chapters 2-4 establish near-linear runtimes for approximation algorithms for several classical problems under this umbrella: Optimal Transport, Minimum-Mean-Cycle, Matrix Balancing, and Matrix Scaling. Two recurring key themes are the use of entropic regularization for exploiting separability of optimization constraints, and the use of probabilistic inequalities for obtaining dimension-free convergence bounds. A dictionary is presented that unifies these various problems, which were historically studied in disparate communities. Part II of this thesis considers Multimarginal Optimal Transport (MOT) and other optimization problems over joint distributions with many constrained marginals. Despite the syntactic similarities with the problems in part I, these problems require fundamentally different algorithms and analyses. The key issue limiting the many applications of MOT is that in general, MOT requires exponential time in the number of marginals k and their support sizes n. Chapters 5-6 develop a general theory about what "structure" makes MOT solvable in time that is polynomial in n and k. We demonstrate this general theory on applications in diverse fields ranging from operations research to data science to fluid dynamics to quantum chemistry. Chapter 7 dedicates special attention to the popular MOT application of Wasserstein barycenters--resolving the complexity of this problem and uncovering the subtle dependence of the dimension on the answer.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/150436

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses