Temporal Adaptive RGBT Tracking with Modality Prompt

Temporal Adaptive RGBT Tracking with Modality Prompt

2024 | Hongyu Wang, Xiaotao Liu*, Yifan Li, Meng Sun, Dian Yuan, Jing Liu
This paper proposes a novel temporal adaptive RGBT tracking framework named TATrack. RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers focus on spatial information between the template and the search region, but they have limited exploitation of temporal information. TATrack introduces a spatio-temporal two-stream structure that captures temporal information through an online updated template. The framework integrates feature extraction and cross-modal interaction using modality prompts, enabling the tracker to comprehensively exploit spatio-temporal and multi-modal information for target localization. A spatio-temporal interaction (STI) mechanism is designed to bridge two branches and enable cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that TATrack achieves state-of-the-art performance while running at real-time speed. TATrack captures temporal information by an online updated template, which helps the tracker adapt to the state changes of the target object. The framework also includes a modality-complementary prompter to generate valid visual prompts for the task-oriented multi-modal tracking. The backbone includes L standard visual transformer encoders for feature extraction and relation modeling. The STI mechanism enables cross-frame propagation of spatio-temporal information. The online template update is based on the maximum target classification score output by the prediction head. TATrack outperforms previous state-of-the-art methods in precision, normalized precision, and success rates on the LasHeR, RGBT234, and RGBT210 benchmarks. The framework also performs well in various challenging scenarios such as occlusion, fast movement, scale change, and aspect ratio change. The results show that TATrack can effectively utilize temporal information to improve tracking performance.This paper proposes a novel temporal adaptive RGBT tracking framework named TATrack. RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers focus on spatial information between the template and the search region, but they have limited exploitation of temporal information. TATrack introduces a spatio-temporal two-stream structure that captures temporal information through an online updated template. The framework integrates feature extraction and cross-modal interaction using modality prompts, enabling the tracker to comprehensively exploit spatio-temporal and multi-modal information for target localization. A spatio-temporal interaction (STI) mechanism is designed to bridge two branches and enable cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that TATrack achieves state-of-the-art performance while running at real-time speed. TATrack captures temporal information by an online updated template, which helps the tracker adapt to the state changes of the target object. The framework also includes a modality-complementary prompter to generate valid visual prompts for the task-oriented multi-modal tracking. The backbone includes L standard visual transformer encoders for feature extraction and relation modeling. The STI mechanism enables cross-frame propagation of spatio-temporal information. The online template update is based on the maximum target classification score output by the prediction head. TATrack outperforms previous state-of-the-art methods in precision, normalized precision, and success rates on the LasHeR, RGBT234, and RGBT210 benchmarks. The framework also performs well in various challenging scenarios such as occlusion, fast movement, scale change, and aspect ratio change. The results show that TATrack can effectively utilize temporal information to improve tracking performance.
Reach us at info@study.space
[slides and audio] Temporal Adaptive RGBT Tracking with Modality Prompt