xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

22 May 2024 | Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao
This paper introduces xRAG, an innovative context compression method designed for retrieval-augmented generation. xRAG reinterprets document embeddings from dense retrieval as features from the retrieval modality and integrates these embeddings into the language model representation space through a modality fusion methodology. This approach eliminates the need for textual counterparts, achieving an extreme compression rate. The only trainable component in xRAG is the modality bridge, while the retriever and language model remain frozen. This design allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results show that xRAG achieves over 10% improvement across six knowledge-intensive tasks, adaptable to various language model backbones, and reduces FLOPs by a factor of 3.53. xRAG outperforms previous context compression methods and matches the performance of uncompressed models on several datasets, demonstrating its effectiveness and efficiency in retrieval-augmented generation.This paper introduces xRAG, an innovative context compression method designed for retrieval-augmented generation. xRAG reinterprets document embeddings from dense retrieval as features from the retrieval modality and integrates these embeddings into the language model representation space through a modality fusion methodology. This approach eliminates the need for textual counterparts, achieving an extreme compression rate. The only trainable component in xRAG is the modality bridge, while the retriever and language model remain frozen. This design allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results show that xRAG achieves over 10% improvement across six knowledge-intensive tasks, adaptable to various language model backbones, and reduces FLOPs by a factor of 3.53. xRAG outperforms previous context compression methods and matches the performance of uncompressed models on several datasets, demonstrating its effectiveness and efficiency in retrieval-augmented generation.
Reach us at info@study.space