SURe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

SURe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

17 Apr 2024 | Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jongjin Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, Jinwoo Shin
SURE is a framework designed to enhance open-domain question answering (ODQA) using large language models (LLMs). The core idea of SURE is to generate summaries of retrieved passages conditioned on multiple answer candidates, which are then used to evaluate the validity and informativeness of each candidate. This approach helps LLMs predict more accurate answers by leveraging the summaries as explicit rationales derived from the retrieved information. SURE constructs summaries for each candidate, evaluates their validity and ranking, and selects the most plausible answer. Experimental results on various ODQA benchmarks show that SURE improves exact match (EM) by up to 4.6% and F1 score by up to 4.0% compared to standard prompting methods. SURE is compatible with different retrieval methods and LLMs, and the generated summaries provide additional benefits in measuring the importance of retrieved passages and serving as preferred rationales for both models and humans. The framework is zero-shot, making it applicable even without query-relevant few-shot examples. SURE demonstrates effectiveness in improving ODQA performance and offers practical advantages for real-world applications.SURE is a framework designed to enhance open-domain question answering (ODQA) using large language models (LLMs). The core idea of SURE is to generate summaries of retrieved passages conditioned on multiple answer candidates, which are then used to evaluate the validity and informativeness of each candidate. This approach helps LLMs predict more accurate answers by leveraging the summaries as explicit rationales derived from the retrieved information. SURE constructs summaries for each candidate, evaluates their validity and ranking, and selects the most plausible answer. Experimental results on various ODQA benchmarks show that SURE improves exact match (EM) by up to 4.6% and F1 score by up to 4.0% compared to standard prompting methods. SURE is compatible with different retrieval methods and LLMs, and the generated summaries provide additional benefits in measuring the importance of retrieved passages and serving as preferred rationales for both models and humans. The framework is zero-shot, making it applicable even without query-relevant few-shot examples. SURE demonstrates effectiveness in improving ODQA performance and offers practical advantages for real-world applications.
Reach us at info@study.space