July 14–18, 2024 | Florin Cuconasu*, Giovanni Trappolini*, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, Fabrizio Silvestri
This paper explores the impact of retrieved documents on Retrieval-Augmented Generation (RAG) systems, focusing on the characteristics of an effective retriever. The study uses the Natural Questions (NQ) dataset and evaluates the performance of various Large Language Models (LLMs) with different types of documents: relevant, distracting, and random. Key findings include:
1. **Position of Relevant Information**: The position of relevant information near the query significantly improves LLM performance.
2. **Impact of Distracting Documents**: The presence of distracting documents, even if they are semantically related to the query but do not contain the answer, negatively affects LLM effectiveness.
3. **Random Documents**: Surprisingly, adding random documents, which are unrelated to the query, can improve LLM accuracy by up to 35% when correctly positioned.
The study also proposes heuristics for optimizing prompt construction in RAG systems and highlights the need for further research to understand the underlying mechanisms and develop specialized information retrieval techniques for generative models.This paper explores the impact of retrieved documents on Retrieval-Augmented Generation (RAG) systems, focusing on the characteristics of an effective retriever. The study uses the Natural Questions (NQ) dataset and evaluates the performance of various Large Language Models (LLMs) with different types of documents: relevant, distracting, and random. Key findings include:
1. **Position of Relevant Information**: The position of relevant information near the query significantly improves LLM performance.
2. **Impact of Distracting Documents**: The presence of distracting documents, even if they are semantically related to the query but do not contain the answer, negatively affects LLM effectiveness.
3. **Random Documents**: Surprisingly, adding random documents, which are unrelated to the query, can improve LLM accuracy by up to 35% when correctly positioned.
The study also proposes heuristics for optimizing prompt construction in RAG systems and highlights the need for further research to understand the underlying mechanisms and develop specialized information retrieval techniques for generative models.