Show simple item record

dc.contributor.authorHow, Jonathan P.
dc.contributor.authorMichini, Bernard J.
dc.date.accessioned2013-10-23T16:12:50Z
dc.date.available2013-10-23T16:12:50Z
dc.date.issued2012-09
dc.identifier.isbn978-3-642-33485-6
dc.identifier.isbn978-3-642-33486-3
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttp://hdl.handle.net/1721.1/81484
dc.description.abstractInverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small.en_US
dc.language.isoen_US
dc.publisherSpringer-Verlagen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-642-33486-3_10en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceOther University Web Domainen_US
dc.titleBayesian Nonparametric Inverse Reinforcement Learningen_US
dc.typeArticleen_US
dc.identifier.citationMichini, Bernard, and Jonathan P. How. Bayesian Nonparametric Inverse Reinforcement Learning. Springer-Verlag, 2012.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronauticsen_US
dc.contributor.mitauthorMichini, Bernard J.en_US
dc.contributor.mitauthorHow, Jonathan P.en_US
dc.relation.journalMachine Learning and Knowledge Discovery in Databasesen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsMichini, Bernard; How, Jonathan P.en_US
dc.identifier.orcidhttps://orcid.org/0000-0001-8576-1930
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record