16 Nov 2016 | Colin Lea, Michael D. Flynn, René Vidal, Austin Reiter, Gregory D. Hager
The paper introduces Temporal Convolutional Networks (TCNs) for fine-grained action segmentation and detection in videos. TCNs use a hierarchy of temporal convolutions to capture long-range temporal patterns, addressing the limitations of traditional methods that decouple feature extraction and temporal classification. Two types of TCNs are presented: the Encoder-Decoder TCN (ED-TCN) and the Dilated TCN. The ED-TCN uses pooling and upsampling to efficiently capture long-range temporal patterns, while the Dilated TCN employs dilated convolutions. Both models outperform state-of-the-art methods, including LSTM-based models, in terms of accuracy and speed. The paper evaluates these models on three challenging datasets: MERL Shopping, Georgia Tech Egocentric Activities, and 50 Salads, demonstrating significant improvements over existing approaches. The authors also introduce a segmental F1 score, which is more applicable to real-world concerns, and provide implementation details and ablation studies to support their findings.The paper introduces Temporal Convolutional Networks (TCNs) for fine-grained action segmentation and detection in videos. TCNs use a hierarchy of temporal convolutions to capture long-range temporal patterns, addressing the limitations of traditional methods that decouple feature extraction and temporal classification. Two types of TCNs are presented: the Encoder-Decoder TCN (ED-TCN) and the Dilated TCN. The ED-TCN uses pooling and upsampling to efficiently capture long-range temporal patterns, while the Dilated TCN employs dilated convolutions. Both models outperform state-of-the-art methods, including LSTM-based models, in terms of accuracy and speed. The paper evaluates these models on three challenging datasets: MERL Shopping, Georgia Tech Egocentric Activities, and 50 Salads, demonstrating significant improvements over existing approaches. The authors also introduce a segmental F1 score, which is more applicable to real-world concerns, and provide implementation details and ablation studies to support their findings.