[slides and audio] View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

This paper addresses the challenge of person re-identification (ReID) under aerial-ground camera networks (AGPreID), which involves significant view discrepancy between aerial and ground cameras. To tackle this issue, the authors propose a View-decoupled Transformer (VDT) that decouples view-related and view-unrelated features. The VDT consists of two main components: hierarchical subtractive separation and orthogonal loss. Hierarchical subtractive separation separates view-related features from global features, while orthogonal loss constrains these features to be independent. The authors also contribute a large-scale synthetic dataset called CARGO, which includes five aerial and eight ground cameras, 5,000 identities, and 108,563 images. Experiments on CARGO and another dataset (AG-ReID) demonstrate that VDT outperforms previous methods by up to 5.0%/2.7% on mAP/Rank1 on CARGO and 3.7%/5.2% on AG-ReID, while maintaining the same computational complexity. The paper highlights the importance of view decoupling in AGPreID to mitigate the disruption of identity representation caused by view bias.This paper addresses the challenge of person re-identification (ReID) under aerial-ground camera networks (AGPreID), which involves significant view discrepancy between aerial and ground cameras. To tackle this issue, the authors propose a View-decoupled Transformer (VDT) that decouples view-related and view-unrelated features. The VDT consists of two main components: hierarchical subtractive separation and orthogonal loss. Hierarchical subtractive separation separates view-related features from global features, while orthogonal loss constrains these features to be independent. The authors also contribute a large-scale synthetic dataset called CARGO, which includes five aerial and eight ground cameras, 5,000 identities, and 108,563 images. Experiments on CARGO and another dataset (AG-ReID) demonstrate that VDT outperforms previous methods by up to 5.0%/2.7% on mAP/Rank1 on CARGO and 3.7%/5.2% on AG-ReID, while maintaining the same computational complexity. The paper highlights the importance of view decoupling in AGPreID to mitigate the disruption of identity representation caused by view bias.

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

21 Mar 2024 | Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai