Crossmodal attentive skill learner: learning in Atari and beyond with audio–video inputs

Kim, Dong-Ki; Omidshafiei, Shayegan; Pazis, Jason; How, Jonathan P

dc.contributor.author	Kim, Dong-Ki
dc.contributor.author	Omidshafiei, Shayegan
dc.contributor.author	Pazis, Jason
dc.contributor.author	How, Jonathan P
dc.date.accessioned	2022-07-15T20:16:56Z
dc.date.available	2021-09-20T17:30:46Z
dc.date.available	2022-07-15T20:16:56Z
dc.date.issued	2020-01-13
dc.identifier.uri	https://hdl.handle.net/1721.1/131879.2
dc.description.abstract	Abstract This paper introduces the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic architecture [Harb et al. in When waiting is not an option: learning options with a deliberation cost. arXiv preprint arXiv:1709.04571, 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. Agents trained using our approach learn to attend to their various sensory modalities (e.g., audio, video) at the appropriate moments, thereby executing actions based on multiple sensory streams without reliance on supervisory data. We demonstrate empirically that the sensory attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. Further, we provide concrete examples in which the approach not only improves performance in a single task, but accelerates transfer to new tasks. We modify the Arcade Learning Environment [Bellemare et al. in J Artif Intell Res 47:253–279, 2013] to support audio queries (ALE-audio code available at https://github.com/shayegano/Arcade-Learning-Environment), and conduct evaluations of crossmodal learning in the Atari 2600 games H.E.R.O. and Amidar. Finally, building on the recent work of Babaeizadeh et al. [in: International conference on learning representations (ICLR), 2017], we open-source a fast hybrid CPU–GPU implementation of CASL (CASL code available at https://github.com/shayegano/CASL).	en_US
dc.publisher	Springer US	en_US
dc.relation.isversionof	https://doi.org/10.1007/s10458-019-09439-5	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Springer US	en_US
dc.title	Crossmodal attentive skill learner: learning in Atari and beyond with audio–video inputs	en_US
dc.type	Article	en_US
dc.identifier.citation	Autonomous Agents and Multi-Agent Systems. 2020 Jan 13;34(1):16	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2020-09-24T21:37:40Z
dc.language.rfc3066	en
dc.rights.holder	Springer Science+Business Media, LLC, part of Springer Nature
dspace.embargo.terms	Y
dspace.date.submission	2020-09-24T21:37:40Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Publication Information Needed	en_US

Files in this item

Name:: 10458_2019_9439_ReferencePDF.pdf
Size:: 1.536Mb
Format:: Unknown

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/131879.2*	2022-07-15T20:09:54Z	Metadata changed: Verified or entered author name and department authority metadata.
1	1721.1/131879	2021-09-20T17:30:46Z

*Selected version

DSpace@MIT

Crossmodal attentive skill learner: learning in Atari and beyond with audio–video inputs

Files in this item

This item appears in the following Collection(s)

Version History