The paper introduces LoRAT, a method that combines Low-Rank Adaptation (LoRA) with visual tracking to achieve faster training, larger model sizes, and stronger performance. LoRAT leverages the efficiency of LoRA, a technique for parameter-efficient fine-tuning, to adapt large Vision Transformers (ViT) for visual tracking tasks. The key contributions include:
1. **Decoupled Positional Embeddings**: LoRAT decouples the positional embeddings in transformer-based trackers into shared spatial embeddings and independent token type embeddings, allowing for better adaptation of pre-trained ViT models.
2. **Anchor-Free Head Network**: A multilayer perceptron (MLP)-based anchor-free head is designed to replace the convolutional head, addressing inductive biases and improving performance with less computational overhead.
3. **Efficient Training and Inference**: LoRAT achieves significant improvements in training efficiency and inference speed, making it practical to train large-scale trackers with limited resources.
4. **State-of-the-Art Performance**: LoRAT sets new records on multiple benchmarks, including LaSOT, LaSOText, TrackingNet, GOT-10k, and TNL2K, with competitive or superior performance.
The paper also provides detailed experimental results and ablation studies to validate the effectiveness of LoRAT, demonstrating its ability to handle large-scale visual tracking tasks efficiently and effectively.The paper introduces LoRAT, a method that combines Low-Rank Adaptation (LoRA) with visual tracking to achieve faster training, larger model sizes, and stronger performance. LoRAT leverages the efficiency of LoRA, a technique for parameter-efficient fine-tuning, to adapt large Vision Transformers (ViT) for visual tracking tasks. The key contributions include:
1. **Decoupled Positional Embeddings**: LoRAT decouples the positional embeddings in transformer-based trackers into shared spatial embeddings and independent token type embeddings, allowing for better adaptation of pre-trained ViT models.
2. **Anchor-Free Head Network**: A multilayer perceptron (MLP)-based anchor-free head is designed to replace the convolutional head, addressing inductive biases and improving performance with less computational overhead.
3. **Efficient Training and Inference**: LoRAT achieves significant improvements in training efficiency and inference speed, making it practical to train large-scale trackers with limited resources.
4. **State-of-the-Art Performance**: LoRAT sets new records on multiple benchmarks, including LaSOT, LaSOText, TrackingNet, GOT-10k, and TNL2K, with competitive or superior performance.
The paper also provides detailed experimental results and ablation studies to validate the effectiveness of LoRAT, demonstrating its ability to handle large-scale visual tracking tasks efficiently and effectively.