Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

14 Feb 2024 | Guanxiong Sun, Yang Hua, Guosheng Hu, and Neil Robertson
This paper proposes an efficient one-stage video object detection (VOD) framework that exploits temporal consistency in video frames. The method addresses computational bottlenecks in one-stage VOD by introducing two modules: a location prior network (LPN) and a size prior network (SPN). The LPN filters out background regions using detection results from the previous frame, reducing the number of pixels processed by the attention module. The SPN skips unnecessary computations on low-level feature maps when no small objects are present in the previous frame. The method is tested on various one-stage detectors, including FCOS, CenterNet, and YOLOX, and achieves superior speed-accuracy trade-offs and compatibility. The framework is efficient, with the LPN and SPN modules significantly reducing computational costs. The results show that the proposed method outperforms existing state-of-the-art VOD methods in terms of efficiency and accuracy. The code is available at https://github.com/guanxiongsun/vfe.pytorch.This paper proposes an efficient one-stage video object detection (VOD) framework that exploits temporal consistency in video frames. The method addresses computational bottlenecks in one-stage VOD by introducing two modules: a location prior network (LPN) and a size prior network (SPN). The LPN filters out background regions using detection results from the previous frame, reducing the number of pixels processed by the attention module. The SPN skips unnecessary computations on low-level feature maps when no small objects are present in the previous frame. The method is tested on various one-stage detectors, including FCOS, CenterNet, and YOLOX, and achieves superior speed-accuracy trade-offs and compatibility. The framework is efficient, with the LPN and SPN modules significantly reducing computational costs. The results show that the proposed method outperforms existing state-of-the-art VOD methods in terms of efficiency and accuracy. The code is available at https://github.com/guanxiongsun/vfe.pytorch.
Reach us at info@study.space