Distractor-aware Siamese Networks for Visual Object Tracking

Distractor-aware Siamese Networks for Visual Object Tracking

18 Aug 2018 | Zheng Zhu*1,2, Qiang Wang*1,2, Bo Li*3, Wei Wu3, Junjie Yan3, and Weiming Hu1,2
This paper addresses the challenge of visual object tracking by proposing a distractor-aware Siamese network, named DaSiamRPN. The main issue with traditional Siamese trackers is their inability to distinguish semantic backgrounds from foreground objects, leading to poor performance in cluttered scenes. The authors analyze the features used in conventional Siamese trackers and identify the imbalance of training data as a key problem. They propose an effective sampling strategy to control the distribution and enhance the model's focus on semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, effectively transferring general embeddings to the current video domain. The approach is extended for long-term tracking using a local-to-global search region strategy, significantly improving performance in out-of-view and full occlusion scenarios. Extensive experiments on various benchmarks show that DaSiamRPN outperforms state-of-the-art methods, achieving a 9.6% relative gain in VOT2016 and a 35.9% relative gain in UAV20L datasets. The tracker also maintains high speeds, reaching 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks. The code is available at <https://github.com/foolwood/DaSiamRPN>.This paper addresses the challenge of visual object tracking by proposing a distractor-aware Siamese network, named DaSiamRPN. The main issue with traditional Siamese trackers is their inability to distinguish semantic backgrounds from foreground objects, leading to poor performance in cluttered scenes. The authors analyze the features used in conventional Siamese trackers and identify the imbalance of training data as a key problem. They propose an effective sampling strategy to control the distribution and enhance the model's focus on semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, effectively transferring general embeddings to the current video domain. The approach is extended for long-term tracking using a local-to-global search region strategy, significantly improving performance in out-of-view and full occlusion scenarios. Extensive experiments on various benchmarks show that DaSiamRPN outperforms state-of-the-art methods, achieving a 9.6% relative gain in VOT2016 and a 35.9% relative gain in UAV20L datasets. The tracker also maintains high speeds, reaching 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks. The code is available at <https://github.com/foolwood/DaSiamRPN>.
Reach us at info@study.space