19 Feb 2024 | Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang
This paper introduces RETPO, a novel framework for optimizing large language models (LLMs) to generate query rewrites that align with the preferences of retrieval systems in conversational search. The framework addresses the challenge of generating effective query rewrites in dialogue contexts, where conventional retrieval systems often fail due to the lack of contextual understanding. RETPO leverages the preferences of retrieval systems to guide the optimization of query rewrites, resulting in more effective and context-aware rewrites.
The framework involves three key steps: (1) generating a variety of potential rewrites using a superior LLM, (2) collecting retrieval performance feedback for these rewrites to form a large-scale dataset called RF COLLECTION, and (3) fine-tuning a smaller LLM to align with the retrieval system's preferences. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, QReCC and TopiOCQA, significantly outperforming existing baselines, including GPT-3.5.
The paper also presents an extensive analysis of the effectiveness of the generated rewrites, showing that they are more informative and specific than human rewrites, and are preferred by GPT-4 in terms of clarity and informativeness. The framework is evaluated on various metrics, including Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG@3), and Recall@k, demonstrating its effectiveness in improving retrieval performance.
The paper also discusses the limitations of the framework, including the exclusive focus on larger-scale language models and the potential for further improvements with access to the full TopiOCQA dataset. Additionally, the framework is tested within the realm of conversational search, but its application is not limited to this task, and future research could adapt the framework to a broader range of tasks and domains. The paper concludes that RETPO advances conversational search performance and shows promising results in generalizing to other tasks.This paper introduces RETPO, a novel framework for optimizing large language models (LLMs) to generate query rewrites that align with the preferences of retrieval systems in conversational search. The framework addresses the challenge of generating effective query rewrites in dialogue contexts, where conventional retrieval systems often fail due to the lack of contextual understanding. RETPO leverages the preferences of retrieval systems to guide the optimization of query rewrites, resulting in more effective and context-aware rewrites.
The framework involves three key steps: (1) generating a variety of potential rewrites using a superior LLM, (2) collecting retrieval performance feedback for these rewrites to form a large-scale dataset called RF COLLECTION, and (3) fine-tuning a smaller LLM to align with the retrieval system's preferences. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, QReCC and TopiOCQA, significantly outperforming existing baselines, including GPT-3.5.
The paper also presents an extensive analysis of the effectiveness of the generated rewrites, showing that they are more informative and specific than human rewrites, and are preferred by GPT-4 in terms of clarity and informativeness. The framework is evaluated on various metrics, including Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG@3), and Recall@k, demonstrating its effectiveness in improving retrieval performance.
The paper also discusses the limitations of the framework, including the exclusive focus on larger-scale language models and the potential for further improvements with access to the full TopiOCQA dataset. Additionally, the framework is tested within the realm of conversational search, but its application is not limited to this task, and future research could adapt the framework to a broader range of tasks and domains. The paper concludes that RETPO advances conversational search performance and shows promising results in generalizing to other tasks.