Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Doshi-Velez, Finale; Pineau, Joelle; Roy, Nicholas

dc.contributor.author	Pineau, Joelle
dc.contributor.author	Doshi-Velez, Finale P
dc.contributor.author	Roy, Nicholas
dc.date.accessioned	2017-04-20T17:54:32Z
dc.date.available	2017-04-20T17:54:32Z
dc.date.issued	2012-04
dc.date.submitted	2012-02
dc.identifier.issn	0004-3702
dc.identifier.uri	http://hdl.handle.net/1721.1/108303
dc.description.abstract	Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.	en_US
dc.language.iso	en_US
dc.publisher	Elsevier	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.artint.2012.04.006	en_US
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs License	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en_US
dc.source	Prof. Roy via Barbara Williams	en_US
dc.title	Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs	en_US
dc.type	Article	en_US
dc.identifier.citation	Doshi-Velez, Finale; Pineau, Joelle and Roy, Nicholas. “Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs.” Artificial Intelligence 187–188 (August 2012): 115–132. © 2012 Elsevier B.V.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Roy, Nicholas	en_US
dc.contributor.mitauthor	Doshi-Velez, Finale P
dc.contributor.mitauthor	Roy, Nicholas
dc.relation.journal	Artificial Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Doshi-Velez, Finale; Pineau, Joelle; Roy, Nicholas	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-8293-0492
mit.license	PUBLISHER_CC	en_US

Files in this item

Name:: fdoshi-aij12 (1).pdf
Size:: 297.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record