Show simple item record

dc.contributor.authorWingate, David
dc.contributor.authorGoodman, Noah D.
dc.contributor.authorRoy, Daniel M.
dc.contributor.authorKaelbling, Leslie P.
dc.contributor.authorTenenbaum, Joshua B.
dc.date.accessioned2014-05-19T19:07:57Z
dc.date.available2014-05-19T19:07:57Z
dc.date.issued2011-07
dc.identifier.urihttp://hdl.handle.net/1721.1/87054
dc.description.abstractWe consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm’s search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm—our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.en_US
dc.description.sponsorshipUnited States. Air Force Office of Scientific Research (FA9550-07-1-0075)en_US
dc.description.sponsorshipUnited States. Office of Naval Research (N00014-07-1-0937)en_US
dc.language.isoen_US
dc.publisherInternational Joint Conference on Artificial Intelligence (IJCAI)en_US
dc.relation.isversionofhttp://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-263en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleBayesian Policy Search with Policy Priorsen_US
dc.typeArticleen_US
dc.identifier.citationWingate, David, Noah D. Goodman, Daniel M. Roy, Leslie P. Kaelbling, and Joshua B. Tenenbaum. "Bayesian Policy Search with Policy Priors." Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, July 16-22, 2011, Barcelona, Spain.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Brain and Cognitive Sciencesen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.mitauthorWingate, Daviden_US
dc.contributor.mitauthorRoy, Daniel M.en_US
dc.contributor.mitauthorKaelbling, Leslie P.en_US
dc.contributor.mitauthorTenenbaum, Joshua B.en_US
dc.relation.journalProceedings of the Twenty-Second International Joint Conference on Artificial Intelligenceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsWingate, David; Goodman, Noah D.; Roy, Daniel M.; Kaelbling, Leslie P.; Tenenbaum, Joshua B.en_US
dc.identifier.orcidhttps://orcid.org/0000-0002-1925-2035
dc.identifier.orcidhttps://orcid.org/0000-0001-6054-7145
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record