Multiple Object Tracking as ID Prediction

Multiple Object Tracking as ID Prediction

25 Mar 2024 | Ruopeng Gao, Yijun Zhang, and Limin Wang
This paper proposes MOTIP, a novel approach to multiple object tracking (MOT) by treating it as an end-to-end ID prediction problem. The method directly predicts the ID labels for objects in the current frame based on historical trajectory information, eliminating the need for heuristic algorithms and surrogate tasks. MOTIP is designed to be streamlined and efficient, with a simple architecture that includes a DETR detector, a learnable ID dictionary, and an ID decoder. The ID decoder uses the embeddings of the current frame's objects and their corresponding ID embeddings to predict the ID labels. The method achieves impressive performance on complex scenarios like DanceTrack and SportsMOT, and is competitive with other transformer-based methods on MOT17. The paper also discusses the limitations of existing tracking-by-detection methods and the advantages of the proposed approach. The results show that MOTIP can learn tracking capabilities directly from training data, leading to better performance in complex scenarios. The method is evaluated on various datasets and compared with state-of-the-art methods, demonstrating its effectiveness and potential for future research.This paper proposes MOTIP, a novel approach to multiple object tracking (MOT) by treating it as an end-to-end ID prediction problem. The method directly predicts the ID labels for objects in the current frame based on historical trajectory information, eliminating the need for heuristic algorithms and surrogate tasks. MOTIP is designed to be streamlined and efficient, with a simple architecture that includes a DETR detector, a learnable ID dictionary, and an ID decoder. The ID decoder uses the embeddings of the current frame's objects and their corresponding ID embeddings to predict the ID labels. The method achieves impressive performance on complex scenarios like DanceTrack and SportsMOT, and is competitive with other transformer-based methods on MOT17. The paper also discusses the limitations of existing tracking-by-detection methods and the advantages of the proposed approach. The results show that MOTIP can learn tracking capabilities directly from training data, leading to better performance in complex scenarios. The method is evaluated on various datasets and compared with state-of-the-art methods, demonstrating its effectiveness and potential for future research.
Reach us at info@study.space
[slides] Multiple Object Tracking as ID Prediction | StudySpace