OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

14 Mar 2024 | Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang
OneTracker is a unified framework for visual object tracking that integrates RGB and RGB+X tracking tasks. The framework consists of two components: Foundation Tracker and Prompt Tracker. Foundation Tracker is pre-trained on large-scale RGB tracking datasets to develop strong temporal matching capabilities. Prompt Tracker is then fine-tuned using parameter-efficient methods to adapt to RGB+X tracking tasks. The Prompt Tracker incorporates additional modalities such as language descriptions, masks, depth maps, thermal maps, and event maps through Cross Modality Tracking Prompters (CMT Prompters) and Tracking Task Perception Transformer (TTP Transformer) layers. These components enable the model to efficiently adapt to various tracking tasks while maintaining the strong localization ability of the Foundation Tracker. The framework achieves state-of-the-art performance on multiple tracking benchmarks, demonstrating its effectiveness in handling diverse tracking tasks. The approach leverages large-scale pre-training and parameter-efficient fine-tuning to achieve high performance with minimal additional parameters. OneTracker is designed to handle various tracking tasks, including RGB tracking, RGB+N, RGB+M, RGB+D, RGB+T, and RGB+E tracking, by unifying them into a consistent format. The framework's ability to adapt to different tracking tasks and its efficient parameter usage make it a powerful solution for visual object tracking.OneTracker is a unified framework for visual object tracking that integrates RGB and RGB+X tracking tasks. The framework consists of two components: Foundation Tracker and Prompt Tracker. Foundation Tracker is pre-trained on large-scale RGB tracking datasets to develop strong temporal matching capabilities. Prompt Tracker is then fine-tuned using parameter-efficient methods to adapt to RGB+X tracking tasks. The Prompt Tracker incorporates additional modalities such as language descriptions, masks, depth maps, thermal maps, and event maps through Cross Modality Tracking Prompters (CMT Prompters) and Tracking Task Perception Transformer (TTP Transformer) layers. These components enable the model to efficiently adapt to various tracking tasks while maintaining the strong localization ability of the Foundation Tracker. The framework achieves state-of-the-art performance on multiple tracking benchmarks, demonstrating its effectiveness in handling diverse tracking tasks. The approach leverages large-scale pre-training and parameter-efficient fine-tuning to achieve high performance with minimal additional parameters. OneTracker is designed to handle various tracking tasks, including RGB tracking, RGB+N, RGB+M, RGB+D, RGB+T, and RGB+E tracking, by unifying them into a consistent format. The framework's ability to adapt to different tracking tasks and its efficient parameter usage make it a powerful solution for visual object tracking.
Reach us at info@study.space