Show simple item record

dc.contributor.authorPineau, Joelle
dc.contributor.authorDoshi-Velez, Finale P
dc.contributor.authorRoy, Nicholas
dc.date.accessioned2017-04-20T17:54:32Z
dc.date.available2017-04-20T17:54:32Z
dc.date.issued2012-04
dc.date.submitted2012-02
dc.identifier.issn0004-3702
dc.identifier.urihttp://hdl.handle.net/1721.1/108303
dc.description.abstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.en_US
dc.language.isoen_US
dc.publisherElsevieren_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.artint.2012.04.006en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceProf. Roy via Barbara Williamsen_US
dc.titleReinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPsen_US
dc.typeArticleen_US
dc.identifier.citationDoshi-Velez, Finale; Pineau, Joelle and Roy, Nicholas. “Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs.” Artificial Intelligence 187–188 (August 2012): 115–132. © 2012 Elsevier B.V.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronauticsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverRoy, Nicholasen_US
dc.contributor.mitauthorDoshi-Velez, Finale P
dc.contributor.mitauthorRoy, Nicholas
dc.relation.journalArtificial Intelligenceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsDoshi-Velez, Finale; Pineau, Joelle; Roy, Nicholasen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-8293-0492
mit.licensePUBLISHER_CCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record