Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

2024 | Shenshen Li, Chen He, Xing Xu*, Fumin Shen, Yang Yang, Heng Tao Shen
This paper proposes a novel framework called Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval, addressing the challenges of matching ambiguity and one-sided cross-modal alignments. The AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration (UMF), which leverages Subjective Logic to filter unreliable matching pairs and select high-confidence cross-modal matches; 2) Uncertainty-based Alignment Refinement (UAR), which simulates coarse-grained alignments and progressively integrates both coarse- and fine-grained alignments; and 3) Cross-modal Masked Modeling (CMM), which explores comprehensive relations between vision and language. The framework is evaluated on three benchmark datasets (CUHK-PEDES, ICFG-PEDES, and RSTPReid) and achieves state-of-the-art performance in supervised, weakly supervised, and domain generalization settings. The AUL method significantly outperforms recent methods in terms of retrieval accuracy, demonstrating its effectiveness in mitigating the impact of matching ambiguity and improving cross-modal alignment. The method is implemented using PyTorch and achieves high performance through adaptive learning and uncertainty-aware modeling. The results show that the AUL framework effectively enhances cross-modal alignment and improves retrieval performance by exploring one-to-many correspondence and quantifying uncertainty in matching pairs. The paper also includes ablation studies and qualitative analysis to validate the effectiveness of the proposed components.This paper proposes a novel framework called Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval, addressing the challenges of matching ambiguity and one-sided cross-modal alignments. The AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration (UMF), which leverages Subjective Logic to filter unreliable matching pairs and select high-confidence cross-modal matches; 2) Uncertainty-based Alignment Refinement (UAR), which simulates coarse-grained alignments and progressively integrates both coarse- and fine-grained alignments; and 3) Cross-modal Masked Modeling (CMM), which explores comprehensive relations between vision and language. The framework is evaluated on three benchmark datasets (CUHK-PEDES, ICFG-PEDES, and RSTPReid) and achieves state-of-the-art performance in supervised, weakly supervised, and domain generalization settings. The AUL method significantly outperforms recent methods in terms of retrieval accuracy, demonstrating its effectiveness in mitigating the impact of matching ambiguity and improving cross-modal alignment. The method is implemented using PyTorch and achieves high performance through adaptive learning and uncertainty-aware modeling. The results show that the AUL framework effectively enhances cross-modal alignment and improves retrieval performance by exploring one-to-many correspondence and quantifying uncertainty in matching pairs. The paper also includes ablation studies and qualitative analysis to validate the effectiveness of the proposed components.
Reach us at info@study.space
Understanding Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval