[slides and audio] SlowFast Networks for Video Recognition

SlowFast networks are introduced for video recognition, combining a Slow pathway for spatial semantics and a Fast pathway for motion at high temporal resolution. The Slow pathway processes low frame rate videos to capture spatial information, while the Fast pathway, with reduced channel capacity, captures fast motion. The two pathways are fused via lateral connections, enabling efficient video recognition. The model achieves state-of-the-art performance on benchmarks like Kinetics, Charades, and AVA. The Fast pathway is lightweight, allowing high temporal resolution without significant computational cost. The SlowFast concept improves performance in both action classification and detection. The model is trained from scratch without pre-training and shows strong results on various video datasets. It outperforms existing methods in terms of accuracy and efficiency, with significant improvements in action detection on the AVA dataset. The architecture is inspired by biological studies of retinal ganglion cells, where different cell types handle different aspects of visual processing. The SlowFast model is effective for video recognition, offering a flexible and efficient approach to capturing both spatial and temporal information in videos.SlowFast networks are introduced for video recognition, combining a Slow pathway for spatial semantics and a Fast pathway for motion at high temporal resolution. The Slow pathway processes low frame rate videos to capture spatial information, while the Fast pathway, with reduced channel capacity, captures fast motion. The two pathways are fused via lateral connections, enabling efficient video recognition. The model achieves state-of-the-art performance on benchmarks like Kinetics, Charades, and AVA. The Fast pathway is lightweight, allowing high temporal resolution without significant computational cost. The SlowFast concept improves performance in both action classification and detection. The model is trained from scratch without pre-training and shows strong results on various video datasets. It outperforms existing methods in terms of accuracy and efficiency, with significant improvements in action detection on the AVA dataset. The architecture is inspired by biological studies of retinal ganglion cells, where different cell types handle different aspects of visual processing. The SlowFast model is effective for video recognition, offering a flexible and efficient approach to capturing both spatial and temporal information in videos.

SlowFast Networks for Video Recognition

29 Oct 2019 | Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming He