TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices

Lin, Ji; Gan, Chuang; Wang, Kuan; Han, Song

dc.contributor.author	Lin, Ji
dc.contributor.author	Gan, Chuang
dc.contributor.author	Wang, Kuan
dc.contributor.author	Han, Song
dc.date.accessioned	2022-06-30T18:32:11Z
dc.date.available	2022-06-30T18:32:11Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/143616
dc.description.abstract	IEEE The explosive growth in video streaming requires video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. The key idea of TSM is to shift part of the channels along the temporal dimension, thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM offers several unique advantages. Firstly, TSM has high performance; it ranks the first on the Something-Something leaderboard upon submission. Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8. Thirdly, TSM has higher scalability compared to 3D networks, enabling large-scale Kinetics training on 1,536 GPUs in 15 minutes. Lastly, TSM enables action concepts learning, which 2D networks cannot model; we visualize the category attention map and find that spatial-temporal action detector emerges during the training of classification tasks. The code is publicly available.	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/TPAMI.2020.3029799	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices	en_US
dc.type	Article	en_US
dc.identifier.citation	Lin, Ji, Gan, Chuang, Wang, Kuan and Han, Song. 2020. "TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices." IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (6).
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	MIT-IBM Watson AI Lab
dc.relation.journal	IEEE Transactions on Pattern Analysis and Machine Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2022-06-30T17:28:29Z
dspace.orderedauthors	Lin, J; Gan, C; Wang, K; Han, S	en_US
dspace.date.submission	2022-06-30T17:28:37Z
mit.journal.volume	18	en_US
mit.journal.issue	6	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 2109.13227.pdf
Size:: 9.649Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record