Show simple item record

dc.contributor.authorJoseph, Joshua Mason
dc.contributor.authorGeramifard, Alborz
dc.contributor.authorRoberts, John W.
dc.contributor.authorHow, Jonathan P.
dc.contributor.authorRoy, Nicholas
dc.date.accessioned2015-05-08T18:36:13Z
dc.date.available2015-05-08T18:36:13Z
dc.date.issued2013-05
dc.identifier.isbn978-1-4673-5643-5
dc.identifier.isbn978-1-4673-5641-1
dc.identifier.issn1050-4729
dc.identifier.urihttp://hdl.handle.net/1721.1/96945
dc.description.abstractReal-world robots commonly have to act in complex, poorly understood environments where the true world dynamics are unknown. To compensate for the unknown world dynamics, we often provide a class of models to a learner so it may select a model, typically using a minimum prediction error metric over a set of training data. Often in real-world domains the model class is unable to capture the true dynamics, due to either limited domain knowledge or a desire to use a small model. In these cases we call the model class misspecified, and an unfortunate consequence of misspecification is that even with unlimited data and computation there is no guarantee the model with minimum prediction error leads to the best performing policy. In this work, our approach improves upon the standard maximum likelihood model selection metric by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model. We present an algorithm for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation. Empirically, we demonstrate that our algorithm is often superior to the maximum likelihood learner in a batch learning setting for two common RL benchmark problems and a third real-world system, the hydrodynamic cart-pole, a domain whose complex dynamics cannot be known exactly.en_US
dc.description.sponsorshipUnited States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-11-1-0688)en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICRA.2013.6630686en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleReinforcement learning with misspecified model classesen_US
dc.typeArticleen_US
dc.identifier.citationJoseph, Joshua, Alborz Geramifard, John W. Roberts, Jonathan P. How, and Nicholas Roy. “Reinforcement Learning with Misspecified Model Classes.” 2013 IEEE International Conference on Robotics and Automation (May 2013).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronauticsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.mitauthorJoseph, Joshua Masonen_US
dc.contributor.mitauthorGeramifard, Alborzen_US
dc.contributor.mitauthorRoberts, John W.en_US
dc.contributor.mitauthorHow, Jonathan P.en_US
dc.contributor.mitauthorRoy, Nicholasen_US
dc.relation.journalProceedings of the 2013 IEEE International Conference on Robotics and Automationen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsJoseph, Joshua; Geramifard, Alborz; Roberts, John W.; How, Jonathan P.; Roy, Nicholasen_US
dc.identifier.orcidhttps://orcid.org/0000-0001-8576-1930
dc.identifier.orcidhttps://orcid.org/0000-0002-2508-1957
dc.identifier.orcidhttps://orcid.org/0000-0002-8293-0492
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record