This paper proposes an unsupervised framework for cross-view geo-localization (CVGL) that leverages unlabeled data to overcome the limitations of supervised methods. Traditional CVGL relies on labeled ground-satellite image pairs, but collecting such pairs is costly and impractical. The proposed framework addresses this by using a cross-view projection to generate initial pseudo-labels and a fast re-ranking mechanism to refine them. The framework achieves competitive performance on three open-source benchmarks and is implemented in the GitHub repository.
The paper introduces Unsupervised Cross-View Geo-Localization (UCVGL), which does not rely on GPS labels or ground-truth correspondences. Instead, it uses spatial correspondences to align cross-view images. The framework includes a cold-start stage to generate initial pseudo-labels and a semi-supervised stage to refine them. The cold-start stage projects ground panoramas into a 3D coordinate system and uses CycleGAN to generate fake satellite images, enabling the retrieval of over 40% of image pairs. The semi-supervised stage refines pseudo-labels using a threshold filter and curriculum learning, improving the correctness ratio from around 30% to 80%.
The framework employs a soft symmetrical InfoNCE loss to train encoders, and self-supervised contrastive learning to enhance feature alignment. The cold-start stage uses correspondence-free projection to align cross-view images, while the semi-supervised stage uses mutual-matching and threshold-filtering to refine pseudo-labels. The framework is evaluated on three datasets (CVUSA, CVACT, VIGOR) and achieves competitive performance with recent supervised methods.
The paper highlights the challenges of unsupervised CVGL, including the cold-start problem and the need for robust spatial alignment. The proposed framework addresses these challenges by leveraging spatial correspondences and pseudo-label refinement. The results show that the framework can effectively utilize unlabeled data to achieve competitive performance in CVGL without relying on labeled data.This paper proposes an unsupervised framework for cross-view geo-localization (CVGL) that leverages unlabeled data to overcome the limitations of supervised methods. Traditional CVGL relies on labeled ground-satellite image pairs, but collecting such pairs is costly and impractical. The proposed framework addresses this by using a cross-view projection to generate initial pseudo-labels and a fast re-ranking mechanism to refine them. The framework achieves competitive performance on three open-source benchmarks and is implemented in the GitHub repository.
The paper introduces Unsupervised Cross-View Geo-Localization (UCVGL), which does not rely on GPS labels or ground-truth correspondences. Instead, it uses spatial correspondences to align cross-view images. The framework includes a cold-start stage to generate initial pseudo-labels and a semi-supervised stage to refine them. The cold-start stage projects ground panoramas into a 3D coordinate system and uses CycleGAN to generate fake satellite images, enabling the retrieval of over 40% of image pairs. The semi-supervised stage refines pseudo-labels using a threshold filter and curriculum learning, improving the correctness ratio from around 30% to 80%.
The framework employs a soft symmetrical InfoNCE loss to train encoders, and self-supervised contrastive learning to enhance feature alignment. The cold-start stage uses correspondence-free projection to align cross-view images, while the semi-supervised stage uses mutual-matching and threshold-filtering to refine pseudo-labels. The framework is evaluated on three datasets (CVUSA, CVACT, VIGOR) and achieves competitive performance with recent supervised methods.
The paper highlights the challenges of unsupervised CVGL, including the cold-start problem and the need for robust spatial alignment. The proposed framework addresses these challenges by leveraging spatial correspondences and pseudo-label refinement. The results show that the framework can effectively utilize unlabeled data to achieve competitive performance in CVGL without relying on labeled data.