Show simple item record

dc.contributor.authorLin, Ji
dc.contributor.authorGan, Chuang
dc.contributor.authorWang, Kuan
dc.contributor.authorHan, Song
dc.date.accessioned2022-06-30T18:32:11Z
dc.date.available2022-06-30T18:32:11Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/1721.1/143616
dc.description.abstractIEEE The explosive growth in video streaming requires video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. The key idea of TSM is to shift part of the channels along the temporal dimension, thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM offers several unique advantages. Firstly, TSM has high performance; it ranks the first on the Something-Something leaderboard upon submission. Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8. Thirdly, TSM has higher scalability compared to 3D networks, enabling large-scale Kinetics training on 1,536 GPUs in 15 minutes. Lastly, TSM enables action concepts learning, which 2D networks cannot model; we visualize the category attention map and find that spatial-temporal action detector emerges during the training of classification tasks. The code is publicly available.en_US
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionof10.1109/TPAMI.2020.3029799en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleTSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devicesen_US
dc.typeArticleen_US
dc.identifier.citationLin, Ji, Gan, Chuang, Wang, Kuan and Han, Song. 2020. "TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices." IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (6).
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMIT-IBM Watson AI Lab
dc.relation.journalIEEE Transactions on Pattern Analysis and Machine Intelligenceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2022-06-30T17:28:29Z
dspace.orderedauthorsLin, J; Gan, C; Wang, K; Han, Sen_US
dspace.date.submission2022-06-30T17:28:37Z
mit.journal.volume18en_US
mit.journal.issue6en_US
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record