13 Jan 2024 | Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, IEEE Senior Member David Crandall, IEEE Senior Member, Bo Du, IEEE Senior Member
This paper provides a comprehensive review and in-depth analysis of Transformer-based Object Re-Identification (Re-ID). It categorizes existing works into four main areas: Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios. The authors highlight the advantages of Transformers in addressing various challenges across these domains, including occlusions, lighting changes, and diverse perspectives. They propose a new Transformer baseline, UntransReID, which achieves state-of-the-art performance on both single- and cross-modal tasks. The survey also covers progress in animal Re-ID, presenting a standardized experimental benchmark and evaluating the applicability of Transformers in this context. Additionally, the paper discusses open issues in the big foundation model era, aiming to serve as a new handbook for researchers in the field. The introduction reviews the development of Re-ID before the introduction of Transformers, emphasizing the limitations of CNN-based methods and the potential of Transformers in handling complex and dynamic scenarios. The background section provides a detailed analysis of the strengths of Transformers, including their powerful modeling capabilities, diverse unsupervised learning paradigms, multi-modal uniformity, and high scalability and generalization. The main body of the paper reviews the latest research on Transformer-based Re-ID, covering image/video-based Re-ID, Re-ID with limited data/annotations, cross-modal Re-ID, and special settings. The paper concludes with a discussion on the future directions and open issues in the field.This paper provides a comprehensive review and in-depth analysis of Transformer-based Object Re-Identification (Re-ID). It categorizes existing works into four main areas: Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios. The authors highlight the advantages of Transformers in addressing various challenges across these domains, including occlusions, lighting changes, and diverse perspectives. They propose a new Transformer baseline, UntransReID, which achieves state-of-the-art performance on both single- and cross-modal tasks. The survey also covers progress in animal Re-ID, presenting a standardized experimental benchmark and evaluating the applicability of Transformers in this context. Additionally, the paper discusses open issues in the big foundation model era, aiming to serve as a new handbook for researchers in the field. The introduction reviews the development of Re-ID before the introduction of Transformers, emphasizing the limitations of CNN-based methods and the potential of Transformers in handling complex and dynamic scenarios. The background section provides a detailed analysis of the strengths of Transformers, including their powerful modeling capabilities, diverse unsupervised learning paradigms, multi-modal uniformity, and high scalability and generalization. The main body of the paper reviews the latest research on Transformer-based Re-ID, covering image/video-based Re-ID, Re-ID with limited data/annotations, cross-modal Re-ID, and special settings. The paper concludes with a discussion on the future directions and open issues in the field.