View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

21 Mar 2024 | Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai
This paper proposes a view-decoupled transformer (VDT) for person re-identification (ReID) under aerial-ground camera networks (AGPReID). Existing ReID methods have mainly focused on homogeneous camera networks, but AGPReID, which involves heterogeneous cameras, remains underexplored. The VDT addresses the challenge of dramatic view discrepancy by decoupling view-related and view-unrelated features through hierarchical subtractive separation and orthogonal loss. The framework is designed to separate global and view-specific features, enabling more discriminative identity learning. A large-scale synthetic dataset, CARGO, is introduced to support AGPReID research, containing 5,000 identities and 108,563 images from five aerial and eight ground cameras. Experiments on CARGO and AG-ReID show that VDT outperforms existing methods in terms of mAP and Rank1 metrics, achieving significant improvements while maintaining similar computational complexity. The VDT is also effective in cross-dataset evaluations, demonstrating its robustness to domain shifts. The proposed method achieves view decoupling, which helps mitigate the impact of view bias on identity representation, making it suitable for both homogeneous and heterogeneous matching scenarios. The results highlight the importance of view decoupling in AGPReID and demonstrate the effectiveness of the VDT framework.This paper proposes a view-decoupled transformer (VDT) for person re-identification (ReID) under aerial-ground camera networks (AGPReID). Existing ReID methods have mainly focused on homogeneous camera networks, but AGPReID, which involves heterogeneous cameras, remains underexplored. The VDT addresses the challenge of dramatic view discrepancy by decoupling view-related and view-unrelated features through hierarchical subtractive separation and orthogonal loss. The framework is designed to separate global and view-specific features, enabling more discriminative identity learning. A large-scale synthetic dataset, CARGO, is introduced to support AGPReID research, containing 5,000 identities and 108,563 images from five aerial and eight ground cameras. Experiments on CARGO and AG-ReID show that VDT outperforms existing methods in terms of mAP and Rank1 metrics, achieving significant improvements while maintaining similar computational complexity. The VDT is also effective in cross-dataset evaluations, demonstrating its robustness to domain shifts. The proposed method achieves view decoupling, which helps mitigate the impact of view bias on identity representation, making it suitable for both homogeneous and heterogeneous matching scenarios. The results highlight the importance of view decoupling in AGPReID and demonstrate the effectiveness of the VDT framework.
Reach us at info@study.space