SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

28 Mar 2024 | Xiaojun Hou1, Jiazhen Xing1, Yijie Qian1, Yaowei Guo1, Shuo Xin1, Junhao Chen1, Kai Tang1, Mengmeng Wang1, Zhengkai Jiang2, Liang Liu3*, Yong Liu1*
SDSTrack is a novel symmetric multimodal tracking framework designed to enhance the robustness and accuracy of visual object tracking (VOT) in complex environments. The framework addresses the limitations of traditional RGB-based trackers by introducing lightweight adaptation techniques and a complementary masked patch distillation strategy. Key contributions include: 1. **Symmetric Multimodal Adaptation (SMA)**: This technique efficiently transfers the feature extraction capability from RGB to other modalities (e.g., Depth, Thermal, Event) using adapters, ensuring balanced and symmetric feature fusion. 2. **Complementary Masked Patch Distillation**: This strategy enhances the model's robustness by randomly masking patches in one modality and distilling the knowledge from clean and masked data, improving the model's ability to handle extreme conditions. 3. **Parameter-Efficient Fine-Tuning**: The framework employs parameter-efficient fine-tuning (PEFT) to minimize training parameters, making it suitable for limited multimodal data. 4. **Performance on Multiple Datasets**: Extensive experiments on various benchmarks (DepthTrack, VOT-RGBD2022, RGBT234, VisEvent) demonstrate that SDSTrack outperforms state-of-the-art methods, achieving new SOTA performance in precision, recall, and F-score. 5. **Robustness in Extreme Conditions**: The method shows superior robustness in challenging scenarios, such as missing or occluded modalities, with significant improvements in precision and success rates. 6. **Inference Speed**: SDSTrack achieves real-time tracking (20.86 fps) while maintaining high accuracy and robustness. The paper also includes detailed implementation details, ablation studies, and visualization results to support the effectiveness of the proposed methods.SDSTrack is a novel symmetric multimodal tracking framework designed to enhance the robustness and accuracy of visual object tracking (VOT) in complex environments. The framework addresses the limitations of traditional RGB-based trackers by introducing lightweight adaptation techniques and a complementary masked patch distillation strategy. Key contributions include: 1. **Symmetric Multimodal Adaptation (SMA)**: This technique efficiently transfers the feature extraction capability from RGB to other modalities (e.g., Depth, Thermal, Event) using adapters, ensuring balanced and symmetric feature fusion. 2. **Complementary Masked Patch Distillation**: This strategy enhances the model's robustness by randomly masking patches in one modality and distilling the knowledge from clean and masked data, improving the model's ability to handle extreme conditions. 3. **Parameter-Efficient Fine-Tuning**: The framework employs parameter-efficient fine-tuning (PEFT) to minimize training parameters, making it suitable for limited multimodal data. 4. **Performance on Multiple Datasets**: Extensive experiments on various benchmarks (DepthTrack, VOT-RGBD2022, RGBT234, VisEvent) demonstrate that SDSTrack outperforms state-of-the-art methods, achieving new SOTA performance in precision, recall, and F-score. 5. **Robustness in Extreme Conditions**: The method shows superior robustness in challenging scenarios, such as missing or occluded modalities, with significant improvements in precision and success rates. 6. **Inference Speed**: SDSTrack achieves real-time tracking (20.86 fps) while maintaining high accuracy and robustness. The paper also includes detailed implementation details, ablation studies, and visualization results to support the effectiveness of the proposed methods.
Reach us at info@study.space