Show simple item record

dc.contributor.authorCutler, Mark Johnson
dc.contributor.authorWalsh, Thomas J
dc.contributor.authorHow, Jonathan P
dc.date.accessioned2016-10-24T15:21:40Z
dc.date.available2016-10-24T15:21:40Z
dc.date.issued2014-06
dc.identifier.isbn978-1-4799-3685-4
dc.identifier.issn1050-4729
dc.identifier.urihttp://hdl.handle.net/1721.1/104936
dc.description.abstractWe present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing the agent to choose to run trajectories at the lowest level that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework's sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies for the real world with fewer expensive real-world samples than previous transfer approaches or learning without simulators.en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICRA.2014.6907423en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleReinforcement learning with multi-fidelity simulatorsen_US
dc.typeArticleen_US
dc.identifier.citationCutler, Mark, Thomas J. Walsh, and Jonathan P. How. “Reinforcement Learning with Multi-Fidelity Simulators.” IEEE, 2014. 3888–3895.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronauticsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Laboratory for Information and Decision Systemsen_US
dc.contributor.mitauthorCutler, Mark Johnson
dc.contributor.mitauthorWalsh, Thomas J
dc.contributor.mitauthorHow, Jonathan P
dc.relation.journalIEEE International Conference on Robotics and Automation. Proceedingsen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsCutler, Mark; Walsh, Thomas J.; How, Jonathan P.en_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-0776-7901
dc.identifier.orcidhttps://orcid.org/0000-0001-8576-1930
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record