Show simple item record

dc.contributor.advisorTamara Broderick.en_US
dc.contributor.authorHuggins, Jonathan H. (Jonathan Hunter)en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2018-09-17T14:51:43Z
dc.date.available2018-09-17T14:51:43Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/117836
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 129-140).en_US
dc.description.abstractBayesian statistical modeling and inference allow scientists, engineers, and companies to learn from data while incorporating prior knowledge, sharing power across experiments via hierarchical models, quantifying their uncertainty about what they have learned, and making predictions about an uncertain future. While Bayesian inference is conceptually straightforward, in practice calculating expectations with respect to the posterior can rarely be done in closed form. Hence, users of Bayesian models must turn to approximate inference methods. But modern statistical applications create many challenges: the latent parameter is often high-dimensional, the models can be complex, and there are large amounts of data that may only be available as a stream or distributed across many computers. Existing algorithm have so far remained unsatisfactory because they either (1) fail to scale to large data sets, (2) provide limited approximation quality, or (3) fail to provide guarantees on the quality of inference. To simultaneously overcome these three possible limitations, I leverage the critical insight that in the large-scale setting, much of the data is redundant. Therefore, it is possible to compress data into a form that admits more efficient inference. I develop two approaches to compressing data for improved scalability. The first is to construct a coreset: a small, weighted subset of our data that is representative of the complete dataset. The second, which I call PASS-GLM, is to construct an exponential family model that approximates the original model. The data is compressed by calculating the finite-dimensional sufficient statistics of the data under the exponential family. An advantage of the compression approach to approximate inference is that an approximate likelihood substitutes for the original likelihood. I show how such approximate likelihoods lend them themselves to a priori analysis and develop general tools for proving when an approximate likelihood will lead to a high-quality approximate posterior. I apply these tools to obtain a priori guarantees on the approximate posteriors produced by PASS-GLM. Finally, for cases when users must rely on algorithms that do not have a priori accuracy guarantees, I develop a method for comparing the quality of the inferences produced by competing algorithms. The method comes equipped with provable guarantees while also being computationally efficient.en_US
dc.description.statementofresponsibilityby Jonathan Hunter Huggins.en_US
dc.format.extent140 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleScaling Bayesian inference : theoretical foundations and practical methodsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1052123785en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record