TSM: Temporal Shift Module for Efficient Video Understanding

Lin, Ji; Gan, Chuang; Han, Song

dc.contributor.author	Lin, Ji
dc.contributor.author	Gan, Chuang
dc.contributor.author	Han, Song
dc.date.accessioned	2022-06-30T17:26:01Z
dc.date.available	2022-06-30T17:26:01Z
dc.date.issued	2019
dc.identifier.uri	https://hdl.handle.net/1721.1/143615
dc.description.abstract	© 2019 IEEE. The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: It ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: Https://github. com/mit-han-lab/temporal-shift-module.	en_US
dc.language.iso	en
dc.publisher	IEEE	en_US
dc.relation.isversionof	10.1109/ICCV.2019.00718	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Computer Vision Foundation	en_US
dc.title	TSM: Temporal Shift Module for Efficient Video Understanding	en_US
dc.type	Article	en_US
dc.identifier.citation	Lin, Ji, Gan, Chuang and Han, Song. 2019. "TSM: Temporal Shift Module for Efficient Video Understanding." Proceedings of the IEEE International Conference on Computer Vision, 2019-October.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	MIT-IBM Watson AI Lab
dc.relation.journal	Proceedings of the IEEE International Conference on Computer Vision	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2022-06-30T17:03:24Z
dspace.orderedauthors	Lin, J; Gan, C; Han, S	en_US
dspace.date.submission	2022-06-30T17:03:35Z
mit.journal.volume	2019-October	en_US
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 2109.13227.pdf
Size:: 9.649Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record