Scalable reward learning from demonstration

Michini, Bernard; Cutler, Mark; How, Jonathan P.

dc.contributor.author	Michini, Bernard J.
dc.contributor.author	How, Jonathan P.
dc.contributor.author	Cutler, Mark Johnson
dc.date.accessioned	2015-05-08T18:42:15Z
dc.date.available	2015-05-08T18:42:15Z
dc.date.issued	2013-05
dc.identifier.isbn	978-1-4673-5643-5
dc.identifier.isbn	978-1-4673-5641-1
dc.identifier.issn	1050-4729
dc.identifier.uri	http://hdl.handle.net/1721.1/96946
dc.description.abstract	Reward learning from demonstration is the task of inferring the intents or goals of an agent demonstrating a task. Inverse reinforcement learning methods utilize the Markov decision process (MDP) framework to learn rewards, but typically scale poorly since they rely on the calculation of optimal value functions. Several key modifications are made to a previously developed Bayesian nonparametric inverse reinforcement learning algorithm that avoid calculation of an optimal value function and no longer require discretization of the state or action spaces. Experimental results given demonstrate the ability of the resulting algorithm to scale to larger problems and learn in domains with continuous demonstrations.	en_US
dc.description.sponsorship	United States. Office of Naval Research (Autonomy Program Contract N000140910625)	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/ICRA.2013.6630592	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Scalable reward learning from demonstration	en_US
dc.type	Article	en_US
dc.identifier.citation	Michini, Bernard, Mark Cutler, and Jonathan P. How. “Scalable Reward Learning from Demonstration.” 2013 IEEE International Conference on Robotics and Automation (May 2013).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Aerospace Controls Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.mitauthor	Michini, Bernard J.	en_US
dc.contributor.mitauthor	Cutler, Mark Johnson	en_US
dc.contributor.mitauthor	How, Jonathan P.	en_US
dc.relation.journal	Proceedings of the 2013 IEEE International Conference on Robotics and Automation	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Michini, Bernard; Cutler, Mark; How, Jonathan P.	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
dc.identifier.orcid	https://orcid.org/0000-0003-0776-7901
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: How_Scalable reward.pdf
Size:: 1.500Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record