MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scaling Bayesian inference : theoretical foundations and practical methods

Author(s)
Huggins, Jonathan H. (Jonathan Hunter)
Thumbnail
DownloadFull printable version (3.447Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Tamara Broderick.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Bayesian statistical modeling and inference allow scientists, engineers, and companies to learn from data while incorporating prior knowledge, sharing power across experiments via hierarchical models, quantifying their uncertainty about what they have learned, and making predictions about an uncertain future. While Bayesian inference is conceptually straightforward, in practice calculating expectations with respect to the posterior can rarely be done in closed form. Hence, users of Bayesian models must turn to approximate inference methods. But modern statistical applications create many challenges: the latent parameter is often high-dimensional, the models can be complex, and there are large amounts of data that may only be available as a stream or distributed across many computers. Existing algorithm have so far remained unsatisfactory because they either (1) fail to scale to large data sets, (2) provide limited approximation quality, or (3) fail to provide guarantees on the quality of inference. To simultaneously overcome these three possible limitations, I leverage the critical insight that in the large-scale setting, much of the data is redundant. Therefore, it is possible to compress data into a form that admits more efficient inference. I develop two approaches to compressing data for improved scalability. The first is to construct a coreset: a small, weighted subset of our data that is representative of the complete dataset. The second, which I call PASS-GLM, is to construct an exponential family model that approximates the original model. The data is compressed by calculating the finite-dimensional sufficient statistics of the data under the exponential family. An advantage of the compression approach to approximate inference is that an approximate likelihood substitutes for the original likelihood. I show how such approximate likelihoods lend them themselves to a priori analysis and develop general tools for proving when an approximate likelihood will lead to a high-quality approximate posterior. I apply these tools to obtain a priori guarantees on the approximate posteriors produced by PASS-GLM. Finally, for cases when users must rely on algorithms that do not have a priori accuracy guarantees, I develop a method for comparing the quality of the inferences produced by competing algorithms. The method comes equipped with provable guarantees while also being computationally efficient.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
 
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
 
Cataloged from student-submitted PDF version of thesis.
 
Includes bibliographical references (pages 129-140).
 
Date issued
2018
URI
http://hdl.handle.net/1721.1/117836
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.