Coresets for scalable Bayesian logistic regression

Huggins, Jonathan H.; Campbell, Trevor David; Broderick, Tamara A

dc.contributor.author	Huggins, Jonathan H.
dc.contributor.author	Campbell, Trevor David
dc.contributor.author	Broderick, Tamara A
dc.date.accessioned	2021-01-27T18:50:42Z
dc.date.available	2021-01-27T18:50:42Z
dc.date.issued	2016-12
dc.identifier.issn	1049-5258
dc.identifier.uri	https://hdl.handle.net/1721.1/129582
dc.description.abstract	The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification. In this paper, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset - both for fixed, known datasets, and in expectation for a wide class of data generative models. Crucially, the proposed approach also permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. Furthermore, constructing the coreset takes a negligible amount of time compared to that required to run MCMC on it.	en_US
dc.description.sponsorship	United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N000141110688)	en_US
dc.language.iso	en
dc.publisher	Curran	en_US
dc.relation.isversionof	https://papers.nips.cc/paper/2016/hash/2b0f658cbffd284984fb11d90254081f-Abstract.html	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Neural Information Processing Systems (NIPS)	en_US
dc.title	Coresets for scalable Bayesian logistic regression	en_US
dc.type	Article	en_US
dc.identifier.citation	Huggins, Jonathan H. et al. “Coresets for scalable Bayesian logistic regression.” Paper presented at the 30th Conference on Neural Information Processing Systems (NIPS 2016), Bacelona, Spain, December 5-10 2016, Curran © 2016 The Author(s)	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	30th Conference on Neural Information Processing Systems (NIPS 2016)	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2020-12-03T17:45:31Z
dspace.orderedauthors	Huggins, JH; Campbell, T; Broderick, T	en_US
dspace.date.submission	2020-12-03T17:45:35Z
mit.journal.volume	2016	en_US
mit.license	PUBLISHER_POLICY
mit.metadata.status	Complete

Files in this item

Name:: NIPS-2016-coresets-for-scalabl ...
Size:: 1.453Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record