Bayesian Nonparametric Inverse Reinforcement Learning

Michini, Bernard; How, Jonathan P.

dc.contributor.author	How, Jonathan P.
dc.contributor.author	Michini, Bernard J.
dc.date.accessioned	2013-10-23T16:12:50Z
dc.date.available	2013-10-23T16:12:50Z
dc.date.issued	2012-09
dc.identifier.isbn	978-3-642-33485-6
dc.identifier.isbn	978-3-642-33486-3
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	http://hdl.handle.net/1721.1/81484
dc.description.abstract	Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small.	en_US
dc.language.iso	en_US
dc.publisher	Springer-Verlag	en_US
dc.relation.isversionof	http://dx.doi.org/10.1007/978-3-642-33486-3_10	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike 3.0	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/	en_US
dc.source	Other University Web Domain	en_US
dc.title	Bayesian Nonparametric Inverse Reinforcement Learning	en_US
dc.type	Article	en_US
dc.identifier.citation	Michini, Bernard, and Jonathan P. How. Bayesian Nonparametric Inverse Reinforcement Learning. Springer-Verlag, 2012.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.mitauthor	Michini, Bernard J.	en_US
dc.contributor.mitauthor	How, Jonathan P.	en_US
dc.relation.journal	Machine Learning and Knowledge Discovery in Databases	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Michini, Bernard; How, Jonathan P.	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: How_Bayesian nonparametric.pdf
Size:: 427.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record