dc.contributor.author | Cutler, Mark Johnson | |
dc.contributor.author | Walsh, Thomas J | |
dc.contributor.author | How, Jonathan P | |
dc.date.accessioned | 2016-10-24T15:21:40Z | |
dc.date.available | 2016-10-24T15:21:40Z | |
dc.date.issued | 2014-06 | |
dc.identifier.isbn | 978-1-4799-3685-4 | |
dc.identifier.issn | 1050-4729 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/104936 | |
dc.description.abstract | We present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing the agent to choose to run trajectories at the lowest level that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework's sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies for the real world with fewer expensive real-world samples than previous transfer approaches or learning without simulators. | en_US |
dc.language.iso | en_US | |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/ICRA.2014.6907423 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | Reinforcement learning with multi-fidelity simulators | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Cutler, Mark, Thomas J. Walsh, and Jonathan P. How. “Reinforcement Learning with Multi-Fidelity Simulators.” IEEE, 2014. 3888–3895. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Aeronautics and Astronautics | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | en_US |
dc.contributor.mitauthor | Cutler, Mark Johnson | |
dc.contributor.mitauthor | Walsh, Thomas J | |
dc.contributor.mitauthor | How, Jonathan P | |
dc.relation.journal | IEEE International Conference on Robotics and Automation. Proceedings | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dspace.orderedauthors | Cutler, Mark; Walsh, Thomas J.; How, Jonathan P. | en_US |
dspace.embargo.terms | N | en_US |
dc.identifier.orcid | https://orcid.org/0000-0003-0776-7901 | |
dc.identifier.orcid | https://orcid.org/0000-0001-8576-1930 | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |