Bayesian Policy Search with Policy Priors

Wingate, David; Goodman, Noah D.; Roy, Daniel M.; Kaelbling, Leslie P.; Tenenbaum, Joshua B.

dc.contributor.author	Wingate, David
dc.contributor.author	Goodman, Noah D.
dc.contributor.author	Roy, Daniel M.
dc.contributor.author	Kaelbling, Leslie P.
dc.contributor.author	Tenenbaum, Joshua B.
dc.date.accessioned	2014-05-19T19:07:57Z
dc.date.available	2014-05-19T19:07:57Z
dc.date.issued	2011-07
dc.identifier.uri	http://hdl.handle.net/1721.1/87054
dc.description.abstract	We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm’s search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm—our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.	en_US
dc.description.sponsorship	United States. Air Force Office of Scientific Research (FA9550-07-1-0075)	en_US
dc.description.sponsorship	United States. Office of Naval Research (N00014-07-1-0937)	en_US
dc.language.iso	en_US
dc.publisher	International Joint Conference on Artificial Intelligence (IJCAI)	en_US
dc.relation.isversionof	http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-263	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Bayesian Policy Search with Policy Priors	en_US
dc.type	Article	en_US
dc.identifier.citation	Wingate, David, Noah D. Goodman, Daniel M. Roy, Leslie P. Kaelbling, and Joshua B. Tenenbaum. "Bayesian Policy Search with Policy Priors." Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, July 16-22, 2011, Barcelona, Spain.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.mitauthor	Wingate, David	en_US
dc.contributor.mitauthor	Roy, Daniel M.	en_US
dc.contributor.mitauthor	Kaelbling, Leslie P.	en_US
dc.contributor.mitauthor	Tenenbaum, Joshua B.	en_US
dc.relation.journal	Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Wingate, David; Goodman, Noah D.; Roy, Daniel M.; Kaelbling, Leslie P.; Tenenbaum, Joshua B.	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-1925-2035
dc.identifier.orcid	https://orcid.org/0000-0001-6054-7145
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Kaelbling_Bayesian policy.pdf
Size:: 173.7Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record