Show simple item record

dc.contributor.authorZhou, Bolei
dc.contributor.authorAndonian, Alexander Joseph
dc.contributor.authorOliva, Aude
dc.contributor.authorTorralba, Antonio
dc.date.accessioned2020-05-07T20:19:00Z
dc.date.available2020-05-07T20:19:00Z
dc.date.issued2018-10
dc.identifier.isbn9783030012458
dc.identifier.isbn9783030012465
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/1721.1/125123
dc.description.abstractTemporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos (Code and models are available at http://relation.csail.mit.edu/.).en_US
dc.description.sponsorshipDARPA XAI program No. FA8750-18-C-0004en_US
dc.description.sponsorshipNSF Grant No. 152481en_US
dc.description.sponsorshipVannevar Bush Faculty Fellowship program funded by the ONRgrant No. N00014-16-1-311en_US
dc.description.sponsorshiptelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00341en_US
dc.language.isoen
dc.publisherSpringer International Publishingen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-030-01246-5_49en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleTemporal Relational Reasoning in Videosen_US
dc.typeArticleen_US
dc.identifier.citationZhou, Bolei, et al. "Temporal Relational Reasoning in Videos." European Conference on Computer Vision, 2018, Munich, Germanyen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalComputer Vision - EECV 2018en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-07-11T17:27:12Z
dspace.date.submission2019-07-11T17:27:14Z
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record