3 Jan 2024 | Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li
ODTrack is a novel video-level framework for visual tracking that uses online dense temporal token learning to associate contextual relationships across video frames. Unlike traditional methods that rely on sparse temporal relationships between reference and search frames, ODTrack processes video sequences of arbitrary length to capture spatiotemporal trajectory relationships of an instance. It compresses the discriminative features of a target into a token sequence to achieve frame-to-frame association. This approach allows the model to propagate tokens across video frames in an auto-regressive manner, enabling efficient model representation and computation. ODTrack achieves state-of-the-art performance on seven benchmarks, including LaSOT, TrackingNet, GOT10K, LaSOT_ext, VOT2020, TNL2K, and OTB100, while running at real-time speed. The framework introduces two temporal token propagation attention mechanisms that effectively capture spatio-temporal trajectory information. ODTrack's video-level tracking pipeline enables dense association of contextual relationships across video frames, improving tracking performance in long-term scenarios. The method avoids complex online update strategies by leveraging token propagation, leading to more efficient model representation and computation. ODTrack outperforms existing methods on multiple benchmarks, demonstrating its effectiveness in video-level tracking.ODTrack is a novel video-level framework for visual tracking that uses online dense temporal token learning to associate contextual relationships across video frames. Unlike traditional methods that rely on sparse temporal relationships between reference and search frames, ODTrack processes video sequences of arbitrary length to capture spatiotemporal trajectory relationships of an instance. It compresses the discriminative features of a target into a token sequence to achieve frame-to-frame association. This approach allows the model to propagate tokens across video frames in an auto-regressive manner, enabling efficient model representation and computation. ODTrack achieves state-of-the-art performance on seven benchmarks, including LaSOT, TrackingNet, GOT10K, LaSOT_ext, VOT2020, TNL2K, and OTB100, while running at real-time speed. The framework introduces two temporal token propagation attention mechanisms that effectively capture spatio-temporal trajectory information. ODTrack's video-level tracking pipeline enables dense association of contextual relationships across video frames, improving tracking performance in long-term scenarios. The method avoids complex online update strategies by leveraging token propagation, leading to more efficient model representation and computation. ODTrack outperforms existing methods on multiple benchmarks, demonstrating its effectiveness in video-level tracking.