[slides] Temporal Action Detection with Structured Segment Networks

This paper introduces the Structured Segment Network (SSN), a novel framework for temporal action detection. SSN models the temporal structure of each action instance using a structured temporal pyramid. On top of this pyramid, a decomposed discriminative model is introduced, consisting of two classifiers: one for classifying actions and another for determining completeness. This allows the framework to effectively distinguish between positive proposals and background or incomplete ones, leading to accurate recognition and localization. The components are integrated into a unified network that can be efficiently trained end-to-end. Additionally, a simple yet effective temporal action proposal scheme, called Temporal Actionness Grouping (TAG), is devised to generate high-quality action proposals. The proposed method outperforms previous state-of-the-art methods on two challenging benchmarks, THUMOS14 and ActivityNet, demonstrating superior accuracy and strong adaptability in handling actions with various temporal structures. The SSN framework excels in modeling temporal structures, enabling discrimination between complete and incomplete proposals, efficient end-to-end learning, and superior detection performance on benchmark datasets. The method also introduces a novel proposal generation technique, TAG, which uses an actionness classifier to generate high-quality proposals. The framework is trained using sparse snippet sampling to reduce computational cost and enable efficient end-to-end training. The method is evaluated on two large-scale action detection benchmarks, ActivityNet and THUMOS14, and shows significant performance improvements over existing methods. The results demonstrate that the SSN framework is both accurate and effective for temporal action detection.This paper introduces the Structured Segment Network (SSN), a novel framework for temporal action detection. SSN models the temporal structure of each action instance using a structured temporal pyramid. On top of this pyramid, a decomposed discriminative model is introduced, consisting of two classifiers: one for classifying actions and another for determining completeness. This allows the framework to effectively distinguish between positive proposals and background or incomplete ones, leading to accurate recognition and localization. The components are integrated into a unified network that can be efficiently trained end-to-end. Additionally, a simple yet effective temporal action proposal scheme, called Temporal Actionness Grouping (TAG), is devised to generate high-quality action proposals. The proposed method outperforms previous state-of-the-art methods on two challenging benchmarks, THUMOS14 and ActivityNet, demonstrating superior accuracy and strong adaptability in handling actions with various temporal structures. The SSN framework excels in modeling temporal structures, enabling discrimination between complete and incomplete proposals, efficient end-to-end learning, and superior detection performance on benchmark datasets. The method also introduces a novel proposal generation technique, TAG, which uses an actionness classifier to generate high-quality proposals. The framework is trained using sparse snippet sampling to reduce computational cost and enable efficient end-to-end training. The method is evaluated on two large-scale action detection benchmarks, ActivityNet and THUMOS14, and shows significant performance improvements over existing methods. The results demonstrate that the SSN framework is both accurate and effective for temporal action detection.

Temporal Action Detection with Structured Segment Networks

18 Sep 2017 | Yue Zhao1, Yuanjun Xiong1, Limin Wang2, Zhirong Wu1, Xiaoou Tang1, and Dahua Lin1