Show simple item record

dc.contributor.authorKim, Dong-Ki
dc.contributor.authorOmidshafiei, Shayegan
dc.contributor.authorPazis, Jason
dc.contributor.authorHow, Jonathan P
dc.date.accessioned2022-07-15T20:16:56Z
dc.date.available2021-09-20T17:30:46Z
dc.date.available2022-07-15T20:16:56Z
dc.date.issued2020-01-13
dc.identifier.urihttps://hdl.handle.net/1721.1/131879.2
dc.description.abstractAbstract This paper introduces the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic architecture [Harb et al. in When waiting is not an option: learning options with a deliberation cost. arXiv preprint arXiv:1709.04571, 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. Agents trained using our approach learn to attend to their various sensory modalities (e.g., audio, video) at the appropriate moments, thereby executing actions based on multiple sensory streams without reliance on supervisory data. We demonstrate empirically that the sensory attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. Further, we provide concrete examples in which the approach not only improves performance in a single task, but accelerates transfer to new tasks. We modify the Arcade Learning Environment [Bellemare et al. in J Artif Intell Res 47:253–279, 2013] to support audio queries (ALE-audio code available at https://github.com/shayegano/Arcade-Learning-Environment), and conduct evaluations of crossmodal learning in the Atari 2600 games H.E.R.O. and Amidar. Finally, building on the recent work of Babaeizadeh et al. [in: International conference on learning representations (ICLR), 2017], we open-source a fast hybrid CPU–GPU implementation of CASL (CASL code available at https://github.com/shayegano/CASL).en_US
dc.publisherSpringer USen_US
dc.relation.isversionofhttps://doi.org/10.1007/s10458-019-09439-5en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceSpringer USen_US
dc.titleCrossmodal attentive skill learner: learning in Atari and beyond with audio–video inputsen_US
dc.typeArticleen_US
dc.identifier.citationAutonomous Agents and Multi-Agent Systems. 2020 Jan 13;34(1):16en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronautics
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2020-09-24T21:37:40Z
dc.language.rfc3066en
dc.rights.holderSpringer Science+Business Media, LLC, part of Springer Nature
dspace.embargo.termsY
dspace.date.submission2020-09-24T21:37:40Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version