Scalable Methodologies for Optimizing Over Probability Distributions
Author(s)
Li, Lingxiao![Thumbnail](/bitstream/handle/1721.1/156585/li-lingxiao-phd-eecs-2024-thesis.pdf.jpg?sequence=3&isAllowed=y)
DownloadThesis PDF (31.39Mb)
Advisor
Solomon, Justin
Terms of use
Metadata
Show full item recordAbstract
Modern machine learning applications, such as generative modeling and probabilistic inference, demand a new generation of methodologies for optimizing over the space of probability distributions, where the optimization variable represents a weighted population of potentially infinitely many points. Despite the ubiquity of these distributional optimization problems, there has been a shortage of scalable methods grounded in mathematical principles. To bridge this gap, this thesis introduces two complementary lines of work for scalable distributional optimization. The first part of this thesis focuses on optimizing over discrete distributions to generate high-quality samples for probabilistic inference. We present two works that tackle sampling by optimizing pairwise interaction energies defined on a collection of particles. The first work focuses on designing a new family of mollified interaction energies over moving particles, offering a unified framework for constrained and unconstrained sampling. The second work focuses on the scalable optimization of a family of popular interaction energies—maximum mean discrepancy of mean-zero kernels—to generate high-quality coresets from millions of biased samples, obtaining better-than-i.i.d. unbiased coresets. The second part transitions to optimizing over continuous distributions through neural network parameterization, enabling the generation of endless streams of samples once optimized. We exploit convexity principles to identify suitable mathematical formulations and scalable optimization algorithms in three contexts: 1) averaging distributions in a geometric meaningful manner using a regularized Wasserstein barycenter dual formulation; 2) identifying local minima of non-convex optimization as a generative model by learning proximal operators with global convergence guarantees; and 3) solving mass-conserving differential equations of probability flows without temporal or spatial discretization by leveraging the self-consistency of the dynamical system.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology