This paper presents a comprehensive survey on generative information retrieval (GenIR), a new paradigm that shifts information retrieval from similarity-based matching to generative approaches. Traditional information retrieval (IR) systems rely on sparse retrieval methods, such as Boolean Retrieval, BM25, and UniCOIL, which match queries to documents based on indexed databases. With the rise of deep learning, dense retrieval methods like DPR and ANCE have been developed, capturing deep semantic information of documents and improving retrieval precision. However, these methods rely on large-scale document indices and cannot be optimized end-to-end. GenIR, on the other hand, leverages pre-trained language models to directly generate document identifiers (DocIDs) or user-centric responses, offering more flexibility, efficiency, and creativity.
GenIR is categorized into two main patterns: generative document retrieval (GR) and reliable response generation. GR involves retrieving documents by generating their identifiers, while reliable response generation directly generates user-centric responses. GR models use generative models to directly generate DocIDs, eliminating the need for large-scale document indices. This approach allows for end-to-end training and has led to advancements in model training, document identifier design, incremental learning, and downstream task adaptation. Reliable response generation, on the other hand, utilizes language models to generate accurate and user-centric responses, incorporating source citations to enhance credibility and transparency.
The paper reviews the latest research progress in GenIR, including advancements in GR and reliable response generation. It discusses evaluation metrics, challenges, and future directions in GenIR systems. The survey also highlights the potential of GenIR in various applications, such as fact verification, entity linking, open-domain QA, dialogue, slot filling, and multi-modal retrieval. The paper concludes that GenIR has the potential to revolutionize information retrieval by providing more accurate, efficient, and user-centric information access.This paper presents a comprehensive survey on generative information retrieval (GenIR), a new paradigm that shifts information retrieval from similarity-based matching to generative approaches. Traditional information retrieval (IR) systems rely on sparse retrieval methods, such as Boolean Retrieval, BM25, and UniCOIL, which match queries to documents based on indexed databases. With the rise of deep learning, dense retrieval methods like DPR and ANCE have been developed, capturing deep semantic information of documents and improving retrieval precision. However, these methods rely on large-scale document indices and cannot be optimized end-to-end. GenIR, on the other hand, leverages pre-trained language models to directly generate document identifiers (DocIDs) or user-centric responses, offering more flexibility, efficiency, and creativity.
GenIR is categorized into two main patterns: generative document retrieval (GR) and reliable response generation. GR involves retrieving documents by generating their identifiers, while reliable response generation directly generates user-centric responses. GR models use generative models to directly generate DocIDs, eliminating the need for large-scale document indices. This approach allows for end-to-end training and has led to advancements in model training, document identifier design, incremental learning, and downstream task adaptation. Reliable response generation, on the other hand, utilizes language models to generate accurate and user-centric responses, incorporating source citations to enhance credibility and transparency.
The paper reviews the latest research progress in GenIR, including advancements in GR and reliable response generation. It discusses evaluation metrics, challenges, and future directions in GenIR systems. The survey also highlights the potential of GenIR in various applications, such as fact verification, entity linking, open-domain QA, dialogue, slot filling, and multi-modal retrieval. The paper concludes that GenIR has the potential to revolutionize information retrieval by providing more accurate, efficient, and user-centric information access.