Understanding Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

The paper introduces AQA-Track, an adaptive tracking algorithm that leverages spatio-temporal transformers to effectively capture and utilize spatio-temporal information for visual tracking. The key contributions of AQA-Track include: 1. **Spatio-Temporal Information Exploration**: AQA-Track uses autoregressive queries to learn spatio-temporal information without relying on hand-crafted components, which allows for a more comprehensive exploration of spatio-temporal features. 2. **Novel Attention Mechanism**: A novel attention mechanism is designed to interact with existing queries, generating new queries in the current frame, thereby enhancing the model's ability to capture instantaneous changes in target appearance. 3. **Spatio-Temporal Information Fusion Module (STM)**: This module combines static and instantaneous appearance changes to guide robust tracking, improving the model's performance on challenging benchmarks. 4. **Performance on Benchmark Datasets**: Extensive experiments on six popular tracking benchmarks (LaSOT, LaSOT_ext, TrackingNet, GOT-10k, TNL2K, and UAV123) demonstrate that AQA-Track achieves state-of-the-art performance, with notable improvements over existing trackers. 5. **Implementation Details**: The algorithm is implemented using PyTorch, trained on NVIDIA v100 GPUs, and evaluated on various datasets. Two variants, AQATrack-256 and AQATrack-384, are introduced, differing in template and search region sizes. 6. **Ablation Study**: The effectiveness of key components, such as the temporal decoder, temporal queries, and the length of spatio-temporal information, is evaluated, showing that these components significantly contribute to the model's performance. 7. **Conclusion**: AQA-Track's ability to model continuous spatio-temporal information and its competitive performance on multiple benchmarks highlight its potential for advanced visual tracking tasks.The paper introduces AQA-Track, an adaptive tracking algorithm that leverages spatio-temporal transformers to effectively capture and utilize spatio-temporal information for visual tracking. The key contributions of AQA-Track include: 1. **Spatio-Temporal Information Exploration**: AQA-Track uses autoregressive queries to learn spatio-temporal information without relying on hand-crafted components, which allows for a more comprehensive exploration of spatio-temporal features. 2. **Novel Attention Mechanism**: A novel attention mechanism is designed to interact with existing queries, generating new queries in the current frame, thereby enhancing the model's ability to capture instantaneous changes in target appearance. 3. **Spatio-Temporal Information Fusion Module (STM)**: This module combines static and instantaneous appearance changes to guide robust tracking, improving the model's performance on challenging benchmarks. 4. **Performance on Benchmark Datasets**: Extensive experiments on six popular tracking benchmarks (LaSOT, LaSOT_ext, TrackingNet, GOT-10k, TNL2K, and UAV123) demonstrate that AQA-Track achieves state-of-the-art performance, with notable improvements over existing trackers. 5. **Implementation Details**: The algorithm is implemented using PyTorch, trained on NVIDIA v100 GPUs, and evaluated on various datasets. Two variants, AQATrack-256 and AQATrack-384, are introduced, differing in template and search region sizes. 6. **Ablation Study**: The effectiveness of key components, such as the temporal decoder, temporal queries, and the length of spatio-temporal information, is evaluated, showing that these components significantly contribute to the model's performance. 7. **Conclusion**: AQA-Track's ability to model continuous spatio-temporal information and its competitive performance on multiple benchmarks highlight its potential for advanced visual tracking tasks.

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

15 Mar 2024 | Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji