TSM: Temporal Shift Module for Efficient Video Understanding

TSM: Temporal Shift Module for Efficient Video Understanding

22 Aug 2019 | Ji Lin, Chuang Gan, Song Han
The Temporal Shift Module (TSM) is a novel approach for efficient video understanding that combines the computational efficiency of 2D CNNs with the temporal modeling capabilities of 3D CNNs. TSM enables temporal modeling without additional computational cost by shifting channels along the temporal dimension, allowing information exchange between neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM is extended to online settings, enabling real-time low-latency video recognition and object detection. TSM achieves high accuracy and efficiency, ranking first on the Something-Something leaderboard upon publication. It achieves low latency on edge devices, with 13ms on Jetson Nano and 35ms on Galaxy Note8 for online video recognition. TSM is hardware-efficient, as it only requires operations supported by 2D CNNs, which are well-optimized for both software and hardware. TSM supports both offline and online video recognition, with bi-directional TSM for offline and uni-directional TSM for online settings. TSM improves the performance of 2D CNNs on video recognition tasks, achieving state-of-the-art results on multiple datasets. TSM is efficient and accurate, achieving high performance with low computational cost. It is suitable for deployment on edge devices and real-time applications. TSM is a versatile and effective solution for video understanding, enabling both low-latency online recognition and high-throughput offline processing.The Temporal Shift Module (TSM) is a novel approach for efficient video understanding that combines the computational efficiency of 2D CNNs with the temporal modeling capabilities of 3D CNNs. TSM enables temporal modeling without additional computational cost by shifting channels along the temporal dimension, allowing information exchange between neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. TSM is extended to online settings, enabling real-time low-latency video recognition and object detection. TSM achieves high accuracy and efficiency, ranking first on the Something-Something leaderboard upon publication. It achieves low latency on edge devices, with 13ms on Jetson Nano and 35ms on Galaxy Note8 for online video recognition. TSM is hardware-efficient, as it only requires operations supported by 2D CNNs, which are well-optimized for both software and hardware. TSM supports both offline and online video recognition, with bi-directional TSM for offline and uni-directional TSM for online settings. TSM improves the performance of 2D CNNs on video recognition tasks, achieving state-of-the-art results on multiple datasets. TSM is efficient and accurate, achieving high performance with low computational cost. It is suitable for deployment on edge devices and real-time applications. TSM is a versatile and effective solution for video understanding, enabling both low-latency online recognition and high-throughput offline processing.
Reach us at info@study.space