28 May 2024 | Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin
The paper "Don’t Forget to Connect! Improving RAG with Graph-based Reranking" by Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, and Anton Tsitsulin addresses the limitations of Retrieval Augmented Generation (RAG) in handling documents with partial information or less obvious connections to the context. The authors introduce G-RAG, a reranker based on graph neural networks (GNNs) that combines document connections and semantic information (via Abstract Meaning Representation graphs) to improve RAG performance. G-RAG outperforms state-of-the-art approaches while requiring less computational resources. The paper also assesses the performance of PaLM 2 as a reranker, finding it to significantly underperform G-RAG, emphasizing the importance of reranking for RAG even when using large language models (LLMs). The contributions of the paper include:
1. Proposing a document-graph-based reranker that leverages connections between documents to identify documents containing weakly connected answers.
2. Introducing new metrics to evaluate ranking scenarios, including tied rankings, which effectively address the optimistic effect of tied rankings.
3. Investigating the performance of PaLM 2 as a reranker, finding that it underperforms compared to G-RAG, highlighting the importance of reranker design in RAG.
The paper discusses the limitations of current RAG methods, such as the inability to capture connections between documents and the overfitting caused by redundant AMR information. G-RAG addresses these issues by using document graphs and a pairwise ranking loss function, which are more effective in identifying relevant documents. The experiments on two datasets, Natural Questions (NQ) and TriviaQA (TQA), demonstrate the effectiveness of G-RAG, showing significant improvements over baseline models. The paper also explores the impact of different embedding models and the performance of LLMs as rerankers, concluding that fine-tuning LLMs can enhance RAG performance.The paper "Don’t Forget to Connect! Improving RAG with Graph-based Reranking" by Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, and Anton Tsitsulin addresses the limitations of Retrieval Augmented Generation (RAG) in handling documents with partial information or less obvious connections to the context. The authors introduce G-RAG, a reranker based on graph neural networks (GNNs) that combines document connections and semantic information (via Abstract Meaning Representation graphs) to improve RAG performance. G-RAG outperforms state-of-the-art approaches while requiring less computational resources. The paper also assesses the performance of PaLM 2 as a reranker, finding it to significantly underperform G-RAG, emphasizing the importance of reranking for RAG even when using large language models (LLMs). The contributions of the paper include:
1. Proposing a document-graph-based reranker that leverages connections between documents to identify documents containing weakly connected answers.
2. Introducing new metrics to evaluate ranking scenarios, including tied rankings, which effectively address the optimistic effect of tied rankings.
3. Investigating the performance of PaLM 2 as a reranker, finding that it underperforms compared to G-RAG, highlighting the importance of reranker design in RAG.
The paper discusses the limitations of current RAG methods, such as the inability to capture connections between documents and the overfitting caused by redundant AMR information. G-RAG addresses these issues by using document graphs and a pairwise ranking loss function, which are more effective in identifying relevant documents. The experiments on two datasets, Natural Questions (NQ) and TriviaQA (TQA), demonstrate the effectiveness of G-RAG, showing significant improvements over baseline models. The paper also explores the impact of different embedding models and the performance of LLMs as rerankers, concluding that fine-tuning LLMs can enhance RAG performance.