Probabilistic modeling and Bayesian inference via triangular transport
Author(s)
Baptista, Ricardo Miguel
DownloadThesis PDF (16.67Mb)
Advisor
Marzouk, Youssef
Terms of use
Metadata
Show full item recordAbstract
Probabilistic modeling and Bayesian inference in non-Gaussian settings are pervasive challenges for science and engineering applications. Transportation of measure provides a principled framework for treating non-Gaussianity and for generalizing many methods that rest on Gaussian assumptions. A transport map deterministically couples a simple reference distribution (e.g., a standard Gaussian) to a complex target distribution via a bijective transformation. Finding such a map enables efficient sampling from the target distribution and immediate access to its density. Triangular maps comprise a general class of transports that are attractive from the perspectives of analysis, modeling, and computation. This thesis: (1) develops a general representation for monotone triangular maps, and adaptive methodologies for estimating such maps (and their associated pushforward densities) from samples; (2) uses triangular maps and their compositions to perform Bayesian computation in likelihood-free settings, including new ensemble methods for nonlinear filtering; and (3) proposes parameter and data dimension reduction techniques with error guarantees for high-dimensional inverse problems.
The first part of the thesis explores the use of triangular transport maps for density estimation and for learning probabilistic graphical models. To construct triangular maps, we represent monotone functions as smooth transformations of unconstrained (non-monotone) functions. We show how certain structural choices for these transformations lead to smooth optimization problems with no spurious local minima, i.e., where all local minima are global minima. Given samples, we then propose an adaptive algorithm that estimates maps with sparse variable dependence. We demonstrate how this framework enables joint and conditional density estimation across a range of sample sizes, and how it can explicitly learn the Markov properties of a continuous non-Gaussian distribution. To this end, we introduce a consistent estimator for the Markov structure based on integrated Hessian information from the log-density. We then propose an iterative algorithm for learning sparse graphical models by exploiting a corresponding sparsity structure in triangular maps. A core advantage of triangular maps is that their components expose conditionals of the target distribution. Hence, learning a map that depends on both parameters and observations enables efficient sampling from the posterior distribution in a Bayesian inference problem. Crucially, this can be done without evaluating the likelihood function, which is often inaccessible or computationally prohibitive in scientific applications (as with forward models given by stochastic partial differential equations, which we consider here). In the second part of this thesis, we propose and analyze a specific composition of transport maps that directly transforms prior samples into posterior samples. We show that this approach, termed the stochastic map (SM) algorithm, improves over other transport-based methods for conditional sampling by reducing the bias and variance of the associated posterior approximation. We then use the SM algorithm to sequentially estimate the state of a chaotic dynamical system given online observations, a nonlinear filtering problem known in geophysical applications as “data assimilation” (DA). We show that when the SM algorithm is restricted to linear maps, it reduces to the ensemble Kalman filter (EnKF), a workhorse algorithm for DA; with nonlinear updates, however, the SM algorithm substantially improves on the performance of the EnKF in challenging regimes.
Finally, we extend the use of transport for high-dimensional inference problems by developing a joint dimension reduction strategy for parameters and observations. We identify relevant low-dimensional projections of these variables by minimizing an information theoretic upper bound on the error in the posterior approximation. We show that this approach reduces to canonical correlation analysis in the linear– Gaussian setting, while outperforming standard dimension reduction strategies in a variety of nonlinear and non-Gaussian inference problems.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsPublisher
Massachusetts Institute of Technology