Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

25 May 2024 | Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen
The paper introduces Sparse RAG, a novel paradigm designed to address the challenges of increased input length and latency in Retrieval-Augmented Generation (RAG) systems. Sparse RAG aims to reduce computational costs by encoding retrieved documents in parallel and selectively decoding the output by attending only to highly relevant caches. This approach combines the assessment of each individual document with the generation process, enhancing both efficiency and generation quality. The method is evaluated on two datasets, PopQA and Biography, demonstrating its ability to achieve a balanced trade-off between generation quality and computational efficiency across short- and long-form generation tasks. sparse RAG outperforms dense RAG and PCW RAG in terms of both quality and latency, making it a versatile and effective solution for RAG systems.The paper introduces Sparse RAG, a novel paradigm designed to address the challenges of increased input length and latency in Retrieval-Augmented Generation (RAG) systems. Sparse RAG aims to reduce computational costs by encoding retrieved documents in parallel and selectively decoding the output by attending only to highly relevant caches. This approach combines the assessment of each individual document with the generation process, enhancing both efficiency and generation quality. The method is evaluated on two datasets, PopQA and Biography, demonstrating its ability to achieve a balanced trade-off between generation quality and computational efficiency across short- and long-form generation tasks. sparse RAG outperforms dense RAG and PCW RAG in terms of both quality and latency, making it a versatile and effective solution for RAG systems.
Reach us at info@study.space
[slides] Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection | StudySpace