MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Author(s)
Zhao, Zelin; Gan, Chuang; Wu, Jiajun; Guo, Xiaoxiao; Tenenbaum, Joshua B
Thumbnail
DownloadAccepted version (1.819Mb)
Open Access Policy

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
<jats:p>Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.</jats:p>
Date issued
2021
URI
https://hdl.handle.net/1721.1/150390
Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Journal
Proceedings of the AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Citation
Zhao, Zelin, Gan, Chuang, Wu, Jiajun, Guo, Xiaoxiao and Tenenbaum, Joshua B. 2021. "Augmenting Policy Learning with Routines Discovered from a Single Demonstration." Proceedings of the AAAI Conference on Artificial Intelligence, 35 (12).
Version: Author's final manuscript

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.