MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

27 Jan 2024 | Yixuan Tang and Yi Yang
**Retrieval-Augmented Generation (RAG)** has shown promise in enhancing large language models (LLMs) by retrieving relevant knowledge, improving response quality, and mitigating hallucinations. However, existing RAG systems struggle with multi-hop queries, which require reasoning over multiple pieces of evidence. To address this gap, the authors introduce MultiHop-RAG, a novel dataset designed for multi-hop queries. The dataset includes a knowledge base, a collection of multi-hop queries, their ground-truth answers, and supporting evidence. The authors detail the construction process, which involves extracting factual sentences from news articles, generating claims, identifying bridge-entities and topics, and creating multi-hop queries. Two experiments are conducted to evaluate the effectiveness of different embedding models and LLMs in handling multi-hop queries. The results show that current RAG methods perform poorly in retrieving and answering multi-hop queries. The authors hope that MultiHop-RAG will serve as a valuable resource for developing and benchmarking effective RAG systems, thereby advancing the adoption of LLMs in practical applications. The dataset and implemented RAG system are publicly available.**Retrieval-Augmented Generation (RAG)** has shown promise in enhancing large language models (LLMs) by retrieving relevant knowledge, improving response quality, and mitigating hallucinations. However, existing RAG systems struggle with multi-hop queries, which require reasoning over multiple pieces of evidence. To address this gap, the authors introduce MultiHop-RAG, a novel dataset designed for multi-hop queries. The dataset includes a knowledge base, a collection of multi-hop queries, their ground-truth answers, and supporting evidence. The authors detail the construction process, which involves extracting factual sentences from news articles, generating claims, identifying bridge-entities and topics, and creating multi-hop queries. Two experiments are conducted to evaluate the effectiveness of different embedding models and LLMs in handling multi-hop queries. The results show that current RAG methods perform poorly in retrieving and answering multi-hop queries. The authors hope that MultiHop-RAG will serve as a valuable resource for developing and benchmarking effective RAG systems, thereby advancing the adoption of LLMs in practical applications. The dataset and implemented RAG system are publicly available.
Reach us at info@study.space
[slides] MultiHop-RAG%3A Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | StudySpace