Transport and Beyond: Efficient Optimization over Probability Distributions
Author(s)
Altschuler, Jason M.
DownloadThesis PDF (5.033Mb)
Additional downloads
Advisor
Parrilo, Pablo A.
Terms of use
Metadata
Show full item recordAbstract
The core of classical optimization focuses on the setting where decision variables are vectors in Rⁿ. However, modern applications throughout machine learning, applied mathematics, and engineering demand high-dimensional optimization problems where decision variables are probability distributions. Can such optimization problems be solved efficiently? This thesis presents two interrelated lines of work in this direction through the common thread of Optimal Transport. A unifying theme is the optimization of joint probability distributions with constrained marginals.
Part I of this thesis considers Optimal Transport and other optimization problems over joint distributions with two constrained marginals. Such tasks are fundamental in alignment problems, matrix problems, graph problems, and more. Chapters 2-4 establish near-linear runtimes for approximation algorithms for several classical problems under this umbrella: Optimal Transport, Minimum-Mean-Cycle, Matrix Balancing, and Matrix Scaling. Two recurring key themes are the use of entropic regularization for exploiting separability of optimization constraints, and the use of probabilistic inequalities for obtaining dimension-free convergence bounds. A dictionary is presented that unifies these various problems, which were historically studied in disparate communities.
Part II of this thesis considers Multimarginal Optimal Transport (MOT) and other optimization problems over joint distributions with many constrained marginals. Despite the syntactic similarities with the problems in part I, these problems require fundamentally different algorithms and analyses. The key issue limiting the many applications of MOT is that in general, MOT requires exponential time in the number of marginals k and their support sizes n. Chapters 5-6 develop a general theory about what "structure" makes MOT solvable in time that is polynomial in n and k. We demonstrate this general theory on applications in diverse fields ranging from operations research to data science to fluid dynamics to quantum chemistry. Chapter 7 dedicates special attention to the popular MOT application of Wasserstein barycenters--resolving the complexity of this problem and uncovering the subtle dependence of the dimension on the answer.
Date issued
2022-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology