This paper introduces a novel concept-based Retrieval-Augmented Generation (RAG) framework that utilizes Abstract Meaning Representation (AMR) to distill essential concepts from long-context supporting documents, enabling Large Language Models (LLMs) to focus on the most supportive knowledge for accurate question-answering. The proposed AMR-based concept distillation algorithm systematically traverses the AMR graph to format key concept nodes with informative semantic features, transforming redundant supporting documents into a concise concept set. The framework significantly enhances RAG performance compared to baselines comprising various backbone LLMs and context compression methods. This is the first work to augment RAG with AMR, offering a novel direction for integrating reliable structured semantic representations with RAG to handle tasks requiring high fidelity to the knowledge.
The framework compresses the cluttered raw retrieved documents into a compact set of crucial concepts distilled from the informative nodes of AMR by referring to reliable linguistic features. The concepts explicitly constrain LLMs to focus solely on vital information in the inference process. Extensive experiments on open-domain question-answering datasets, PopQA and EntityQuestions, demonstrate that the concept-based RAG framework outperforms other baseline methods, particularly as the number of supporting documents increases, while also exhibiting robustness across various backbone LLMs. This emphasizes the distilled concepts are informative for augmenting the RAG process by filtering out interference information. The proposed method achieves superior performance in long-context scenarios, especially when the number of supporting documents is large. The results indicate that the framework effectively enhances inference performance as the number of supporting documents increases, outperforming baselines with various context compression methods and backbone LLMs. This demonstrates its applicability in long-context RAG scenarios. The framework leverages the inherent structured semantic representation of AMR to capture the core concepts explicitly, providing more reliable and informative support for the RAG process. The proposed method is robust across various LLMs and generalizable, making it a versatile solution for enhancing inference performance in long-context scenarios.This paper introduces a novel concept-based Retrieval-Augmented Generation (RAG) framework that utilizes Abstract Meaning Representation (AMR) to distill essential concepts from long-context supporting documents, enabling Large Language Models (LLMs) to focus on the most supportive knowledge for accurate question-answering. The proposed AMR-based concept distillation algorithm systematically traverses the AMR graph to format key concept nodes with informative semantic features, transforming redundant supporting documents into a concise concept set. The framework significantly enhances RAG performance compared to baselines comprising various backbone LLMs and context compression methods. This is the first work to augment RAG with AMR, offering a novel direction for integrating reliable structured semantic representations with RAG to handle tasks requiring high fidelity to the knowledge.
The framework compresses the cluttered raw retrieved documents into a compact set of crucial concepts distilled from the informative nodes of AMR by referring to reliable linguistic features. The concepts explicitly constrain LLMs to focus solely on vital information in the inference process. Extensive experiments on open-domain question-answering datasets, PopQA and EntityQuestions, demonstrate that the concept-based RAG framework outperforms other baseline methods, particularly as the number of supporting documents increases, while also exhibiting robustness across various backbone LLMs. This emphasizes the distilled concepts are informative for augmenting the RAG process by filtering out interference information. The proposed method achieves superior performance in long-context scenarios, especially when the number of supporting documents is large. The results indicate that the framework effectively enhances inference performance as the number of supporting documents increases, outperforming baselines with various context compression methods and backbone LLMs. This demonstrates its applicability in long-context RAG scenarios. The framework leverages the inherent structured semantic representation of AMR to capture the core concepts explicitly, providing more reliable and informative support for the RAG process. The proposed method is robust across various LLMs and generalizable, making it a versatile solution for enhancing inference performance in long-context scenarios.