18 Aug 2018 | Zheng Zhu*1,2, Qiang Wang*1,2, Bo Li*3, Wei Wu3, Junjie Yan3, and Weiming Hu1,2
This paper proposes a distractor-aware Siamese network framework for accurate and long-term visual object tracking. The main contributions include analyzing the limitations of traditional Siamese networks, proposing a novel distractor-aware Siamese Region Proposal Network (DaSiamRPN) for learning discriminative features, and extending the framework for long-term tracking using a local-to-global search strategy. The proposed method addresses the challenges of semantic background and distractor interference, which hinder the performance of existing trackers. During offline training, an effective sampling strategy is introduced to balance the distribution of training data and focus on semantic distractors. During inference, a novel distractor-aware module is designed to incrementally learn and adapt to the current video domain. The method also introduces a simple yet effective local-to-global search strategy to handle long-term tracking scenarios. Extensive experiments on benchmark datasets show that DaSiamRPN significantly outperforms state-of-the-art methods, achieving a 9.6% relative gain on the VOT2016 dataset and a 35.9% relative gain on the UAV20L dataset. The tracker runs at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks, demonstrating high efficiency and accuracy. The code is available at https://github.com/foolwood/DaSiamRPN.This paper proposes a distractor-aware Siamese network framework for accurate and long-term visual object tracking. The main contributions include analyzing the limitations of traditional Siamese networks, proposing a novel distractor-aware Siamese Region Proposal Network (DaSiamRPN) for learning discriminative features, and extending the framework for long-term tracking using a local-to-global search strategy. The proposed method addresses the challenges of semantic background and distractor interference, which hinder the performance of existing trackers. During offline training, an effective sampling strategy is introduced to balance the distribution of training data and focus on semantic distractors. During inference, a novel distractor-aware module is designed to incrementally learn and adapt to the current video domain. The method also introduces a simple yet effective local-to-global search strategy to handle long-term tracking scenarios. Extensive experiments on benchmark datasets show that DaSiamRPN significantly outperforms state-of-the-art methods, achieving a 9.6% relative gain on the VOT2016 dataset and a 35.9% relative gain on the UAV20L dataset. The tracker runs at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks, demonstrating high efficiency and accuracy. The code is available at https://github.com/foolwood/DaSiamRPN.