Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
Author(s)
Diao, Michael Ziyang
DownloadThesis PDF (864.7Kb)
Advisor
Moitra, Ankur
Chewi, Sinho
Terms of use
Metadata
Show full item recordAbstract
Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB–GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures–Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when π is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when π is only log-smooth. Additionally, in the setting where the potential admits a representation as the average of many smooth component functionals, we develop and analyze a variance-reduced extension to (Stochastic) FB–GVI with improved complexity guarantees.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology