Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Zhao, Zelin; Gan, Chuang; Wu, Jiajun; Guo, Xiaoxiao; Tenenbaum, Joshua B

dc.contributor.author	Zhao, Zelin
dc.contributor.author	Gan, Chuang
dc.contributor.author	Wu, Jiajun
dc.contributor.author	Guo, Xiaoxiao
dc.contributor.author	Tenenbaum, Joshua B
dc.date.accessioned	2023-04-04T15:40:27Z
dc.date.available	2023-04-04T15:40:27Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/1721.1/150390
dc.description.abstract	<jats:p>Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.</jats:p>	en_US
dc.language.iso	en
dc.publisher	Association for the Advancement of Artificial Intelligence (AAAI)	en_US
dc.relation.isversionof	10.1609/AAAI.V35I12.17316	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Augmenting Policy Learning with Routines Discovered from a Single Demonstration	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhao, Zelin, Gan, Chuang, Wu, Jiajun, Guo, Xiaoxiao and Tenenbaum, Joshua B. 2021. "Augmenting Policy Learning with Routines Discovered from a Single Demonstration." Proceedings of the AAAI Conference on Artificial Intelligence, 35 (12).
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences	en_US
dc.relation.journal	Proceedings of the AAAI Conference on Artificial Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2023-04-04T15:29:53Z
dspace.orderedauthors	Zhao, Z; Gan, C; Wu, J; Guo, X; Tenenbaum, JB	en_US
dspace.date.submission	2023-04-04T15:29:55Z
mit.journal.volume	35	en_US
mit.journal.issue	12	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 2012.12469.pdf
Size:: 1.819Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record