Reinforcement learning with multi-fidelity simulators

Cutler, Mark; Walsh, Thomas J.; How, Jonathan P.

dc.contributor.author	Cutler, Mark Johnson
dc.contributor.author	Walsh, Thomas J
dc.contributor.author	How, Jonathan P
dc.date.accessioned	2016-10-24T15:21:40Z
dc.date.available	2016-10-24T15:21:40Z
dc.date.issued	2014-06
dc.identifier.isbn	978-1-4799-3685-4
dc.identifier.issn	1050-4729
dc.identifier.uri	http://hdl.handle.net/1721.1/104936
dc.description.abstract	We present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing the agent to choose to run trajectories at the lowest level that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework's sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies for the real world with fewer expensive real-world samples than previous transfer approaches or learning without simulators.	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/ICRA.2014.6907423	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Reinforcement learning with multi-fidelity simulators	en_US
dc.type	Article	en_US
dc.identifier.citation	Cutler, Mark, Thomas J. Walsh, and Jonathan P. How. “Reinforcement Learning with Multi-Fidelity Simulators.” IEEE, 2014. 3888–3895.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.mitauthor	Cutler, Mark Johnson
dc.contributor.mitauthor	Walsh, Thomas J
dc.contributor.mitauthor	How, Jonathan P
dc.relation.journal	IEEE International Conference on Robotics and Automation. Proceedings	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Cutler, Mark; Walsh, Thomas J.; How, Jonathan P.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0003-0776-7901
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: How_Reinforcement learning.pdf
Size:: 5.124Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record