Understanding Retrieval-Augmented Generation for Natural Language Processing%3A A Survey

This paper provides a comprehensive survey of retrieval-augmented generation (RAG) techniques in natural language processing (NLP). RAG leverages external knowledge databases to enhance large language models (LLMs), addressing issues such as hallucination, knowledge update, and domain-specific expertise. The paper reviews key components of RAG, including retrievers, retrieval fusions, and generators, and provides tutorial codes for implementation. It discusses different training strategies, such as RAG with/without datastore updates, and explores RAG applications in various NLP tasks, including language modeling, machine translation, text summarization, question answering, information extraction, text classification, and dialogue systems. The paper also identifies future directions and challenges for the development of RAG.This paper provides a comprehensive survey of retrieval-augmented generation (RAG) techniques in natural language processing (NLP). RAG leverages external knowledge databases to enhance large language models (LLMs), addressing issues such as hallucination, knowledge update, and domain-specific expertise. The paper reviews key components of RAG, including retrievers, retrieval fusions, and generators, and provides tutorial codes for implementation. It discusses different training strategies, such as RAG with/without datastore updates, and explores RAG applications in various NLP tasks, including language modeling, machine translation, text summarization, question answering, information extraction, text classification, and dialogue systems. The paper also identifies future directions and challenges for the development of RAG.

Retrieval-Augmented Generation for Natural Language Processing: A Survey

19 Jul 2024 | Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue