MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scalable Methodologies for Optimizing Over Probability Distributions

Author(s)
Li, Lingxiao
Thumbnail
DownloadThesis PDF (31.39Mb)
Advisor
Solomon, Justin
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Modern machine learning applications, such as generative modeling and probabilistic inference, demand a new generation of methodologies for optimizing over the space of probability distributions, where the optimization variable represents a weighted population of potentially infinitely many points. Despite the ubiquity of these distributional optimization problems, there has been a shortage of scalable methods grounded in mathematical principles. To bridge this gap, this thesis introduces two complementary lines of work for scalable distributional optimization. The first part of this thesis focuses on optimizing over discrete distributions to generate high-quality samples for probabilistic inference. We present two works that tackle sampling by optimizing pairwise interaction energies defined on a collection of particles. The first work focuses on designing a new family of mollified interaction energies over moving particles, offering a unified framework for constrained and unconstrained sampling. The second work focuses on the scalable optimization of a family of popular interaction energies—maximum mean discrepancy of mean-zero kernels—to generate high-quality coresets from millions of biased samples, obtaining better-than-i.i.d. unbiased coresets. The second part transitions to optimizing over continuous distributions through neural network parameterization, enabling the generation of endless streams of samples once optimized. We exploit convexity principles to identify suitable mathematical formulations and scalable optimization algorithms in three contexts: 1) averaging distributions in a geometric meaningful manner using a regularized Wasserstein barycenter dual formulation; 2) identifying local minima of non-convex optimization as a generative model by learning proximal operators with global convergence guarantees; and 3) solving mass-conserving differential equations of probability flows without temporal or spatial discretization by leveraging the self-consistency of the dynamical system.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156585
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.