This paper investigates the robustness of Large Language Models (LLMs) to irrelevant information, particularly when it is semantically related to the task. The authors construct high-quality irrelevant information, ranging from semantically unrelated to highly related content, and evaluate LLMs' performance under various conditions. Key findings include:
1. **Semantic Relevance**: LLMs are more likely to be misled by highly semantically related irrelevant information compared to unrelated information.
2. **Quantity of Information**: As the quantity of irrelevant information increases, LLMs become less capable of identifying relevant information and are more easily distracted.
3. **Question Format**: Free-form and boolean question formats show higher robustness compared to multiple-choice formats.
4. **Current Solutions**: Existing strategies, such as Chain-of-Thought (CoT) and in-context learning (ICL), have limited effectiveness in improving LLMs' ability to handle irrelevant information.
The study highlights the need for more robust methods to mitigate the impact of irrelevant information in RAG systems, emphasizing the importance of reliable and truthful retrieval-augmented generation.This paper investigates the robustness of Large Language Models (LLMs) to irrelevant information, particularly when it is semantically related to the task. The authors construct high-quality irrelevant information, ranging from semantically unrelated to highly related content, and evaluate LLMs' performance under various conditions. Key findings include:
1. **Semantic Relevance**: LLMs are more likely to be misled by highly semantically related irrelevant information compared to unrelated information.
2. **Quantity of Information**: As the quantity of irrelevant information increases, LLMs become less capable of identifying relevant information and are more easily distracted.
3. **Question Format**: Free-form and boolean question formats show higher robustness compared to multiple-choice formats.
4. **Current Solutions**: Existing strategies, such as Chain-of-Thought (CoT) and in-context learning (ICL), have limited effectiveness in improving LLMs' ability to handle irrelevant information.
The study highlights the need for more robust methods to mitigate the impact of irrelevant information in RAG systems, emphasizing the importance of reliable and truthful retrieval-augmented generation.