ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

14 May 2024 | Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall
ADA-Track is a novel end-to-end multi-camera 3D multi-object tracking framework that combines the strengths of tracking-by-attention and tracking-by-detection paradigms. The framework introduces a learnable data association module based on edge-augmented cross-attention, which leverages both appearance and geometric features to enhance the association between track and detection queries. This module is integrated into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined alternately for detection and association tasks, effectively harnessing task dependencies. The method is evaluated on the nuScenes dataset, demonstrating superior performance compared to existing paradigms. The framework is designed to be compatible with various query-based 3D detectors and achieves state-of-the-art results on the nuScenes tracking benchmark. The model architecture details, including the use of ResNet-101 and VoVNetV2 backbones, are provided. Additional experiments validate the effectiveness of the framework, showing improvements in tracking accuracy, robustness against appearance changes, and efficiency in handling class imbalance. The framework also demonstrates strong performance in terms of computational complexity and runtime, with minimal additional overhead compared to existing methods. The results highlight the effectiveness of the alternating detection and association paradigm in achieving high-quality 3D multi-object tracking.ADA-Track is a novel end-to-end multi-camera 3D multi-object tracking framework that combines the strengths of tracking-by-attention and tracking-by-detection paradigms. The framework introduces a learnable data association module based on edge-augmented cross-attention, which leverages both appearance and geometric features to enhance the association between track and detection queries. This module is integrated into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined alternately for detection and association tasks, effectively harnessing task dependencies. The method is evaluated on the nuScenes dataset, demonstrating superior performance compared to existing paradigms. The framework is designed to be compatible with various query-based 3D detectors and achieves state-of-the-art results on the nuScenes tracking benchmark. The model architecture details, including the use of ResNet-101 and VoVNetV2 backbones, are provided. Additional experiments validate the effectiveness of the framework, showing improvements in tracking accuracy, robustness against appearance changes, and efficiency in handling class imbalance. The framework also demonstrates strong performance in terms of computational complexity and runtime, with minimal additional overhead compared to existing methods. The results highlight the effectiveness of the alternating detection and association paradigm in achieving high-quality 3D multi-object tracking.
Reach us at info@study.space
[slides and audio] ADA-Track%2B%2B%3A End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association