SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

28 Mar 2024 | Xiaojun Hou, Jiazhen Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu
SDSTrack is a novel symmetric multimodal tracking framework designed to enhance the performance of visual object tracking in multi-modal scenarios. The framework addresses the limitations of existing methods by introducing lightweight adaptation for efficient fine-tuning, enabling the transfer of feature extraction capabilities from RGB-based trackers to other modalities such as depth, thermal, and event data. It also employs a complementary masked patch distillation strategy based on self-distillation learning to improve robustness and accuracy in complex environments. The symmetric structure ensures balanced and symmetric integration of multimodal features, preventing over-reliance on a single modality. Extensive experiments on benchmark datasets such as DepthTrack, VOT-RGBD2022, and RGBT234 demonstrate that SDSTrack outperforms state-of-the-art methods in various tracking scenarios, achieving impressive results in extreme conditions. The framework's parameter-efficient approach and symmetric design contribute to its effectiveness in handling multimodal data, making it suitable for real-world applications where robustness and accuracy are critical.SDSTrack is a novel symmetric multimodal tracking framework designed to enhance the performance of visual object tracking in multi-modal scenarios. The framework addresses the limitations of existing methods by introducing lightweight adaptation for efficient fine-tuning, enabling the transfer of feature extraction capabilities from RGB-based trackers to other modalities such as depth, thermal, and event data. It also employs a complementary masked patch distillation strategy based on self-distillation learning to improve robustness and accuracy in complex environments. The symmetric structure ensures balanced and symmetric integration of multimodal features, preventing over-reliance on a single modality. Extensive experiments on benchmark datasets such as DepthTrack, VOT-RGBD2022, and RGBT234 demonstrate that SDSTrack outperforms state-of-the-art methods in various tracking scenarios, achieving impressive results in extreme conditions. The framework's parameter-efficient approach and symmetric design contribute to its effectiveness in handling multimodal data, making it suitable for real-world applications where robustness and accuracy are critical.
Reach us at info@study.space