12 Apr 2021 | Patrick Lewis†‡, Ethan Perez*, Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich K"uttler†, Mike Lewis†, Wen-tau Yih†, Tim Rockt"aschel†‡, Sebastian Riedel†‡, Douwe Kiela†
The paper introduces Retrieval-Augmented Generation (RAG), a novel approach that combines pre-trained parametric and non-parametric memory for language generation tasks. RAG models use a pre-trained seq2seq model as the parametric memory and a dense vector index of Wikipedia as the non-parametric memory, accessed through a pre-trained neural retriever. The authors explore two RAG formulations: one that conditions on the same retrieved passage across the entire generated sequence and another that uses different passages per token. These models are fine-tuned on a wide range of knowledge-intensive NLP tasks, including open-domain question answering, abstractive question answering, Jeopardy question generation, and fact verification. The results show that RAG models outperform both parametric seq2seq models and task-specific retrieve-and-extract architectures, achieving state-of-the-art performance on several open-domain QA tasks. Additionally, RAG models generate more specific, diverse, and factually accurate responses compared to a state-of-the-art parametric-only seq2seq baseline. The paper also discusses the benefits of combining parametric and non-parametric memory, the effectiveness of learned retrieval, and the ability to update the model's knowledge without retraining.The paper introduces Retrieval-Augmented Generation (RAG), a novel approach that combines pre-trained parametric and non-parametric memory for language generation tasks. RAG models use a pre-trained seq2seq model as the parametric memory and a dense vector index of Wikipedia as the non-parametric memory, accessed through a pre-trained neural retriever. The authors explore two RAG formulations: one that conditions on the same retrieved passage across the entire generated sequence and another that uses different passages per token. These models are fine-tuned on a wide range of knowledge-intensive NLP tasks, including open-domain question answering, abstractive question answering, Jeopardy question generation, and fact verification. The results show that RAG models outperform both parametric seq2seq models and task-specific retrieve-and-extract architectures, achieving state-of-the-art performance on several open-domain QA tasks. Additionally, RAG models generate more specific, diverse, and factually accurate responses compared to a state-of-the-art parametric-only seq2seq baseline. The paper also discusses the benefits of combining parametric and non-parametric memory, the effectiveness of learned retrieval, and the ability to update the model's knowledge without retraining.