July 14–18, 2024, Washington, DC, USA | Ekaterina Khramtsova*, Shengyao Zhuang*, Mahsa Baktashmotlagh, Guido Zuccon
This paper introduces LARMOR (Large Language Model Assisted Retrieval Model Ranking), an unsupervised approach that leverages Large Language Models (LLMs) to select the most effective dense retriever for a target corpus. The method addresses the challenge of dense retriever selection in scenarios with domain shift, where the target corpus differs from the training corpus, and where access to queries and relevance labels is often limited or unavailable. LARMOR generates pseudo-relevant queries, pseudo-relevance judgments, and pseudo-reference lists based on sampled documents from the target corpus. These pseudo-relevant signals are then used to rank dense retrievers. The effectiveness of LARMOR is evaluated using a large pool of state-of-the-art dense retrievers, demonstrating superior performance compared to existing baselines. The method is the first to rely solely on the target corpus, eliminating the need for training corpora and test labels. The paper also includes a thorough ablation study and explores the impact of LLM model size, backbone, and the number of generated queries. LARMOR is integrated into the DenseQuest system, which implements DR selection over custom collections.This paper introduces LARMOR (Large Language Model Assisted Retrieval Model Ranking), an unsupervised approach that leverages Large Language Models (LLMs) to select the most effective dense retriever for a target corpus. The method addresses the challenge of dense retriever selection in scenarios with domain shift, where the target corpus differs from the training corpus, and where access to queries and relevance labels is often limited or unavailable. LARMOR generates pseudo-relevant queries, pseudo-relevance judgments, and pseudo-reference lists based on sampled documents from the target corpus. These pseudo-relevant signals are then used to rank dense retrievers. The effectiveness of LARMOR is evaluated using a large pool of state-of-the-art dense retrievers, demonstrating superior performance compared to existing baselines. The method is the first to rely solely on the target corpus, eliminating the need for training corpora and test labels. The paper also includes a thorough ablation study and explores the impact of LLM model size, backbone, and the number of generated queries. LARMOR is integrated into the DenseQuest system, which implements DR selection over custom collections.