[slides and audio] Learning Discriminative Model Prediction for Tracking

The paper addresses the challenge of end-to-end trainable computer vision systems for visual tracking, particularly the need to learn a robust target-specific appearance model online during inference. The authors propose an end-to-end tracking architecture that fully exploits both target and background appearance information for target model prediction. This architecture is derived from a discriminative learning loss and designed to predict a powerful model in a few iterations. The approach also learns key aspects of the discriminative loss itself. The proposed tracker achieves state-of-the-art results on six tracking benchmarks, including an EAO score of 0.440 on VOT2018 while running at over 40 FPS. The code and models are available at <https://github.com/visionml/pytracking>. Generic object tracking involves estimating the state of an arbitrary target in each frame of a video sequence. Current approaches typically construct a target model to differentiate between the target and background appearance, which must be learned during inference. The Siamese learning paradigm, while successful, suffers from limitations such as ignoring background information and poor generalization to unseen objects. The proposed architecture addresses these issues by incorporating background information and providing a powerful model update strategy. It is trained using annotated tracking sequences and achieves superior performance compared to state-of-the-art methods. The method involves a discriminative model prediction architecture that benefits from end-to-end training. It includes a target classification branch and a bounding box estimation branch. The target model is derived from a discriminative learning loss and an optimization procedure. The architecture is designed to predict the target model in a few iterations while maximizing its discriminative power. The approach also learns the discriminative loss itself, improving generalization to unseen objects. The paper evaluates the proposed approach on seven tracking benchmarks: VOT2018, LaSOT, TrackingNet, GOT10k, NFS, OTB-100, and UAV123. The results show that the proposed method outperforms state-of-the-art methods in terms of accuracy and robustness, demonstrating its effectiveness in generic object tracking.The paper addresses the challenge of end-to-end trainable computer vision systems for visual tracking, particularly the need to learn a robust target-specific appearance model online during inference. The authors propose an end-to-end tracking architecture that fully exploits both target and background appearance information for target model prediction. This architecture is derived from a discriminative learning loss and designed to predict a powerful model in a few iterations. The approach also learns key aspects of the discriminative loss itself. The proposed tracker achieves state-of-the-art results on six tracking benchmarks, including an EAO score of 0.440 on VOT2018 while running at over 40 FPS. The code and models are available at <https://github.com/visionml/pytracking>. Generic object tracking involves estimating the state of an arbitrary target in each frame of a video sequence. Current approaches typically construct a target model to differentiate between the target and background appearance, which must be learned during inference. The Siamese learning paradigm, while successful, suffers from limitations such as ignoring background information and poor generalization to unseen objects. The proposed architecture addresses these issues by incorporating background information and providing a powerful model update strategy. It is trained using annotated tracking sequences and achieves superior performance compared to state-of-the-art methods. The method involves a discriminative model prediction architecture that benefits from end-to-end training. It includes a target classification branch and a bounding box estimation branch. The target model is derived from a discriminative learning loss and an optimization procedure. The architecture is designed to predict the target model in a few iterations while maximizing its discriminative power. The approach also learns the discriminative loss itself, improving generalization to unseen objects. The paper evaluates the proposed approach on seven tracking benchmarks: VOT2018, LaSOT, TrackingNet, GOT10k, NFS, OTB-100, and UAV123. The results show that the proposed method outperforms state-of-the-art methods in terms of accuracy and robustness, demonstrating its effectiveness in generic object tracking.

Learning Discriminative Model Prediction for Tracking

8 Jun 2020 | Goutam Bhat*, Martin Danelljan*, Luc Van Gool, Radu Timofte

8 Jun 2020 | Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte