Temporal Relational Reasoning in Videos

Zhou, Bolei; Andonian, Alexander Joseph; Oliva, Aude; Torralba, Antonio

dc.contributor.author	Zhou, Bolei
dc.contributor.author	Andonian, Alexander Joseph
dc.contributor.author	Oliva, Aude
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2020-05-07T20:19:00Z
dc.date.available	2020-05-07T20:19:00Z
dc.date.issued	2018-10
dc.identifier.isbn	9783030012458
dc.identifier.isbn	9783030012465
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	https://hdl.handle.net/1721.1/125123
dc.description.abstract	Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos (Code and models are available at http://relation.csail.mit.edu/.).	en_US
dc.description.sponsorship	DARPA XAI program No. FA8750-18-C-0004	en_US
dc.description.sponsorship	NSF Grant No. 152481	en_US
dc.description.sponsorship	Vannevar Bush Faculty Fellowship program funded by the ONRgrant No. N00014-16-1-311	en_US
dc.description.sponsorship	telligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00341	en_US
dc.language.iso	en
dc.publisher	Springer International Publishing	en_US
dc.relation.isversionof	http://dx.doi.org/10.1007/978-3-030-01246-5_49	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Temporal Relational Reasoning in Videos	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhou, Bolei, et al. "Temporal Relational Reasoning in Videos." European Conference on Computer Vision, 2018, Munich, Germany	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	Computer Vision - EECV 2018	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-07-11T17:27:12Z
dspace.date.submission	2019-07-11T17:27:14Z
mit.metadata.status	Complete

Files in this item

Name:: 1711.08496.pdf
Size:: 4.037Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record