[slides and audio] Action recognition with trajectory-pooled deep-convolutional descriptors

This paper introduces a novel video representation called Trajectory-Pooled Deep-Convolutional Descriptor (TDD), which combines the strengths of both hand-crafted and deep-learned features. TDD leverages deep architectures to learn discriminative convolutional feature maps and employs trajectory-constrained pooling to aggregate these features into effective descriptors. To enhance robustness, two normalization methods—spatiotemporal normalization and channel normalization—are designed. The proposed TDDs outperform both hand-crafted features and deep-learned features on challenging datasets like HMDB51 and UCF101, demonstrating superior performance and complementary properties with low-level local features. The effectiveness of TDDs is validated through extensive experiments and comparisons with state-of-the-art methods.This paper introduces a novel video representation called Trajectory-Pooled Deep-Convolutional Descriptor (TDD), which combines the strengths of both hand-crafted and deep-learned features. TDD leverages deep architectures to learn discriminative convolutional feature maps and employs trajectory-constrained pooling to aggregate these features into effective descriptors. To enhance robustness, two normalization methods—spatiotemporal normalization and channel normalization—are designed. The proposed TDDs outperform both hand-crafted features and deep-learned features on challenging datasets like HMDB51 and UCF101, demonstrating superior performance and complementary properties with low-level local features. The effectiveness of TDDs is validated through extensive experiments and comparisons with state-of-the-art methods.

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

19 May 2015 | Limin Wang, Yu Qiao, Xiaou Tang