DINO-Tracker is a novel framework for long-term dense point tracking in videos, combining test-time training with pre-trained DINO-ViT features. The method leverages the semantic information captured by DINO to refine features and improve tracking performance. Key contributions include:
1. **Combining Test-Time Training and External Priors**: DINO-Tracker combines test-time training on a single video with the powerful localized semantic features learned by a pre-trained DINO-ViT model.
2. **Refined Features**: The framework refines DINO's features to better fit the motion observations of the test video, enhancing the robustness and accuracy of tracking.
3. **Self-Supervised Losses**: The method uses a combination of self-supervised losses and regularization to retain and benefit from DINO's semantic prior.
4. **State-of-the-Art Performance**: Extensive evaluation demonstrates that DINO-Tracker achieves state-of-the-art results on known benchmarks, outperforming both self-supervised methods and state-of-the-art supervised trackers, especially in challenging cases of long-term occlusions.
The paper also includes a detailed methodological section, explaining the Delta-DINO model, the optimization objective, and the evaluation on various benchmarks. The results show that DINO-Tracker outperforms existing methods in terms of position accuracy and occlusion accuracy, particularly in scenarios with long-term occlusions.DINO-Tracker is a novel framework for long-term dense point tracking in videos, combining test-time training with pre-trained DINO-ViT features. The method leverages the semantic information captured by DINO to refine features and improve tracking performance. Key contributions include:
1. **Combining Test-Time Training and External Priors**: DINO-Tracker combines test-time training on a single video with the powerful localized semantic features learned by a pre-trained DINO-ViT model.
2. **Refined Features**: The framework refines DINO's features to better fit the motion observations of the test video, enhancing the robustness and accuracy of tracking.
3. **Self-Supervised Losses**: The method uses a combination of self-supervised losses and regularization to retain and benefit from DINO's semantic prior.
4. **State-of-the-Art Performance**: Extensive evaluation demonstrates that DINO-Tracker achieves state-of-the-art results on known benchmarks, outperforming both self-supervised methods and state-of-the-art supervised trackers, especially in challenging cases of long-term occlusions.
The paper also includes a detailed methodological section, explaining the Delta-DINO model, the optimization objective, and the evaluation on various benchmarks. The results show that DINO-Tracker outperforms existing methods in terms of position accuracy and occlusion accuracy, particularly in scenarios with long-term occlusions.