Temporal Convolutional Networks for Action Segmentation and Detection

Temporal Convolutional Networks for Action Segmentation and Detection

16 Nov 2016 | Colin Lea, Michael D. Flynn, René Vidal, Austin Reiter, Gregory D. Hager
Temporal Convolutional Networks (TCNs) are introduced for action segmentation and detection in videos. TCNs use hierarchical temporal convolutions to capture long-range temporal patterns and are significantly faster to train than LSTM-based models. The paper presents two TCN variants: Encoder-Decoder TCN (ED-TCN) and Dilated TCN. ED-TCN uses pooling and upsampling to efficiently capture long-range patterns, while Dilated TCN uses dilated convolutions and skip connections. Both models outperform state-of-the-art methods on three challenging datasets, achieving significant improvements in action segmentation and detection. TCNs are capable of capturing action compositions, segment durations, and long-range dependencies. The paper also introduces a segmental F1 score for evaluating action segmentation and detection tasks. Experiments show that TCNs outperform Bi-LSTM baselines, with ED-TCN producing fewer over-segmentation errors. The models are evaluated on datasets including 50 Salads, MERL Shopping, and Georgia Tech Egocentric Activities. TCNs are shown to be effective in capturing complex temporal patterns and are a strong alternative to RNNs. The paper concludes that TCNs are a promising approach for action segmentation and detection in video analysis.Temporal Convolutional Networks (TCNs) are introduced for action segmentation and detection in videos. TCNs use hierarchical temporal convolutions to capture long-range temporal patterns and are significantly faster to train than LSTM-based models. The paper presents two TCN variants: Encoder-Decoder TCN (ED-TCN) and Dilated TCN. ED-TCN uses pooling and upsampling to efficiently capture long-range patterns, while Dilated TCN uses dilated convolutions and skip connections. Both models outperform state-of-the-art methods on three challenging datasets, achieving significant improvements in action segmentation and detection. TCNs are capable of capturing action compositions, segment durations, and long-range dependencies. The paper also introduces a segmental F1 score for evaluating action segmentation and detection tasks. Experiments show that TCNs outperform Bi-LSTM baselines, with ED-TCN producing fewer over-segmentation errors. The models are evaluated on datasets including 50 Salads, MERL Shopping, and Georgia Tech Egocentric Activities. TCNs are shown to be effective in capturing complex temporal patterns and are a strong alternative to RNNs. The paper concludes that TCNs are a promising approach for action segmentation and detection in video analysis.
Reach us at info@study.space