[slides and audio] OneTracker%3A Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

OneTracker is a unified framework designed to address various visual object tracking tasks, including RGB tracking and RGB+X tracking. The framework consists of two main components: Foundation Tracker and Prompt Tracker. Foundation Tracker is pre-trained on a large-scale RGB tracking dataset to develop strong temporal matching capabilities. Prompt Tracker, built on Foundation Tracker, leverages additional modalities (e.g., language descriptions, masks, depth maps) as prompts to enhance localization accuracy. The framework achieves parameter-efficient fine-tuning by freezing the Foundation Tracker and adjusting only a few additional parameters. Extensive experiments on 11 benchmarks across 6 tracking tasks demonstrate that OneTracker outperforms existing models, achieving state-of-the-art performance. The framework's effectiveness is attributed to its ability to integrate multimodal information and maintain efficient training and inference processes.OneTracker is a unified framework designed to address various visual object tracking tasks, including RGB tracking and RGB+X tracking. The framework consists of two main components: Foundation Tracker and Prompt Tracker. Foundation Tracker is pre-trained on a large-scale RGB tracking dataset to develop strong temporal matching capabilities. Prompt Tracker, built on Foundation Tracker, leverages additional modalities (e.g., language descriptions, masks, depth maps) as prompts to enhance localization accuracy. The framework achieves parameter-efficient fine-tuning by freezing the Foundation Tracker and adjusting only a few additional parameters. Extensive experiments on 11 benchmarks across 6 tracking tasks demonstrate that OneTracker outperforms existing models, achieving state-of-the-art performance. The framework's effectiveness is attributed to its ability to integrate multimodal information and maintain efficient training and inference processes.

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

14 Mar 2024 | Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang