22 Mar 2024 | Dingqiang Ye1,2*, Chao Fan1,2*, Jingzhe Ma1,2, Xiaoming Liu3, and Shiqi Yu1,2†
**BigGait: Learning Gait Representation You Want by Large Vision Models**
This paper addresses the challenge of gait recognition, a critical remote identification technology, by proposing a novel framework called BigGait. Traditional gait recognition methods rely heavily on task-specific supervised learning, which introduces high annotation costs and potential errors. To overcome these limitations, BigGait leverages the all-purpose knowledge produced by Large Vision Models (LVMs) to extract implicit gait representations without requiring explicit supervision.
**Key Contributions:**
1. **BigGait Framework:** A novel gait recognition framework that transforms all-purpose knowledge from LVMs into effective gait representations.
2. **Gait Representation Extractor (GRE):** A module that includes three branches—Mask, Appearance, and Denoising—to remove background noise, transform features, and refine representations.
3. **Performance:** BigGait outperforms existing methods in both within-domain and cross-domain tasks on datasets like CCPG, CASIA-B*, and SUSTech1K.
4. **Challenges and Future Directions:** Discusses challenges in interpretability and purity of learned representations and suggests future research directions.
**Methodology:**
- **Upstream Model:** Utilizes DINOv2, a self-supervised LVM, to extract all-purpose features.
- **Downstream Model:** Adjusted GaitBase for gait metric learning.
- **GRE Module:** Comprises three branches to handle background removal, feature transformation, and feature refinement.
- **Loss Functions:** Combines recognition losses, mask reconstruction loss, smoothness loss, and diversity loss to optimize the representation.
**Experiments:**
- **Datasets:** CPG, CASIA-B*, and SUSTech1K.
- **Results:** BigGait achieves superior performance compared to video-based ReID methods and silhouette-based methods.
- **Ablation Study:** Evaluates the effectiveness of each branch and the Pad-and-Resize strategy.
**Conclusion:**
BigGait provides a practical paradigm for learning next-generation gait representations, leveraging LVMs to reduce annotation costs and improve robustness to gait-irrelevant noises. The work highlights the potential of LVMs in gait recognition and opens new avenues for future research.**BigGait: Learning Gait Representation You Want by Large Vision Models**
This paper addresses the challenge of gait recognition, a critical remote identification technology, by proposing a novel framework called BigGait. Traditional gait recognition methods rely heavily on task-specific supervised learning, which introduces high annotation costs and potential errors. To overcome these limitations, BigGait leverages the all-purpose knowledge produced by Large Vision Models (LVMs) to extract implicit gait representations without requiring explicit supervision.
**Key Contributions:**
1. **BigGait Framework:** A novel gait recognition framework that transforms all-purpose knowledge from LVMs into effective gait representations.
2. **Gait Representation Extractor (GRE):** A module that includes three branches—Mask, Appearance, and Denoising—to remove background noise, transform features, and refine representations.
3. **Performance:** BigGait outperforms existing methods in both within-domain and cross-domain tasks on datasets like CCPG, CASIA-B*, and SUSTech1K.
4. **Challenges and Future Directions:** Discusses challenges in interpretability and purity of learned representations and suggests future research directions.
**Methodology:**
- **Upstream Model:** Utilizes DINOv2, a self-supervised LVM, to extract all-purpose features.
- **Downstream Model:** Adjusted GaitBase for gait metric learning.
- **GRE Module:** Comprises three branches to handle background removal, feature transformation, and feature refinement.
- **Loss Functions:** Combines recognition losses, mask reconstruction loss, smoothness loss, and diversity loss to optimize the representation.
**Experiments:**
- **Datasets:** CPG, CASIA-B*, and SUSTech1K.
- **Results:** BigGait achieves superior performance compared to video-based ReID methods and silhouette-based methods.
- **Ablation Study:** Evaluates the effectiveness of each branch and the Pad-and-Resize strategy.
**Conclusion:**
BigGait provides a practical paradigm for learning next-generation gait representations, leveraging LVMs to reduce annotation costs and improve robustness to gait-irrelevant noises. The work highlights the potential of LVMs in gait recognition and opens new avenues for future research.