28 Mar 2024 | Zhongliang Zhou, Jielu Zhang, Zihan Guan, Mengxuan Hu, Ni Lao, Lan Mu, Sheng Li, Gengchen Mai
IMG2Loc is a novel system that addresses the challenging problem of image geolocalization by redefining it as a text generation task using large multi-modality models (LMMs) like GPT-4V or LLaVA with retrieval-augmented generation. The system first generates an image-based coordinate query database using CLIP-based representations. It then combines these query results with the images themselves to form customized prompts for LMMs. When tested on benchmark datasets such as Im2GPS3k and YFCC4k, IMG2Loc outperforms previous state-of-the-art models without any model training. The approach leverages the strengths of retrieval methods and the advanced capabilities of contemporary language models, making it a significant advancement in image geolocalization. Key contributions include the first successful demonstration of multi-modality foundation models in geolocalization, a training-free approach, and an effective refined sampling process to minimize inaccurate predictions.IMG2Loc is a novel system that addresses the challenging problem of image geolocalization by redefining it as a text generation task using large multi-modality models (LMMs) like GPT-4V or LLaVA with retrieval-augmented generation. The system first generates an image-based coordinate query database using CLIP-based representations. It then combines these query results with the images themselves to form customized prompts for LMMs. When tested on benchmark datasets such as Im2GPS3k and YFCC4k, IMG2Loc outperforms previous state-of-the-art models without any model training. The approach leverages the strengths of retrieval methods and the advanced capabilities of contemporary language models, making it a significant advancement in image geolocalization. Key contributions include the first successful demonstration of multi-modality foundation models in geolocalization, a training-free approach, and an effective refined sampling process to minimize inaccurate predictions.