New tools for Bayesian optimal experimental design and kernel-based generative modeling
Author(s)
Li, Fengyi
DownloadThesis PDF (15.32Mb)
Advisor
Marzouk, Youssef
Terms of use
Metadata
Show full item recordAbstract
This thesis develops new computational approaches for two canonical problems in statistics and machine learning: optimal experimental design and generative modeling.
Optimal experimental design (OED) is important to model development for science and engineering applications and beyond, especially when only a small number of observations can be taken or experiments performed, due to resource limitations. In the Bayesian setting, a useful criterion for the importance of candidate experiments is the expected information gain (EIG) from prior to posterior, or equivalently, the mutual information (MI) between candidate observations and the parameters of interest. Yet estimating EIG for a given design can be quite challenging in nonlinear/non-Gaussian models, and for high-dimensional parameters and observations.
In the first part of the thesis, we introduce new methods for estimating EIG based on transportation of measure. Specifically, we use marginal and conditional density estimates, obtained with semi-parametric transport models, in a Monte Carlo estimator. The density estimates are obtained by solving convex optimization problems. This framework is also compatible with implicit models, where one can simulate from the likelihood or prior but the associated density functions are unknown. We identify the optimal scaling of sample sizes between the "inner" density estimation steps and the "outer" EIG estimation, and demonstrate the efficiency of these choices numerically. If the dimensions of the parameters or observations are high, however, direct density estimation becomes intractable. Here, we use gradient-based information bounds, obtained via log-Sobolev inequalities, to identify optimal projections of the parameters and observations, and then apply our transport-based EIG estimation scheme.
We next study the problem of cardinality-constrained observation selection to maximize MI in non-Gaussian settings, i.e., choosing the most informative subset of k observations from a candidate pool of size n > k. Finding the exact solution is to this combinatorial optimization problem is computationally costly, so we resort to greedy approaches based on computationally inexpensive lower bounds for MI. Here we again use log-Sobolev inequalities to construct such lower bounds for certain classes of non-Gaussian distributions, and exploit these lower bounds within the combinatorial problems. We demonstrate that our method outperforms random selection strategies and Gaussian approximations in many settings, including challenging nonlinear design problems with non-additive noise.
In the second part of the thesis, we turn our attention to generative modeling, which can be understood as the problem of drawing new samples from an unknown distribution, from which a fixed sample is available. Our approaches employ kernel-type algorithms based on diffusion maps.
First, we propose an interacting particle system for generative modeling, based on diffusion maps and Laplacian-adjusted Wasserstein gradient descent (LAWGD). Diffusion maps are used to approximate the generator of the corresponding Langevin diffusion process from samples, and hence to learn the underlying data-generating manifold. LAWGD enables efficient sampling from the target distribution given the generator of the Langevin diffusion process, which we construct here via a spectral approximation via kernels, computed with diffusion maps. Our method requires no offline training and minimal tuning, and can outperform other approaches on data sets of moderate dimension.
Second, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the available training samples, which is then implemented in a discrete-time Langevin sampler to generate new samples. By setting the kernel bandwidth to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with time-stepping stiff stochastic differential equations. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets of increasing dimension, and on a conditional sampling problem arising in stochastic subgrid-scale parametrization of a dynamical system.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsPublisher
Massachusetts Institute of Technology