This paper introduces a novel Chunking-Free In-Context (CFIC) retrieval approach for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems face challenges in grounding responses with precise evidence due to the difficulty of processing long documents and filtering out irrelevant content. Common solutions like document chunking and adapting language models to handle longer contexts have limitations, often disrupting semantic coherence or failing to address noise and inaccuracies in evidence retrieval.
CFIC addresses these challenges by bypassing the conventional chunking process. It uses the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies: Constrained Sentence Prefix Decoding and Skip Decoding. These strategies improve retrieval efficiency and maintain the fidelity of the generated grounding text evidence.
Evaluations on open QA datasets show CFIC's superiority in retrieving relevant and accurate evidence, offering significant improvements over traditional methods. By avoiding document chunking, CFIC provides a more streamlined, effective, and efficient retrieval solution for RAG systems.
The CFIC method involves encoding a document into transformer hidden states. When a user query is input, CFIC continues to encode the query alongside task instructions following the hidden states, subsequently generating grounding text. It leverages the document's encoded hidden states for in-context retrieval, allowing the retrieval system to auto-aggressively decode and pinpoint the precise evidence text to ground the response generation.
CFIC incorporates two decoding strategies: Constrained Sentence Prefix Decoding, which uses sentence prefixes as decoding candidates to shift the model's decision boundary from open-ended to document-dependent generation, and Skip Decoding, which bypasses decoding intermediate tokens and directly selects sentence ends with the highest likelihood of the [eos] token. These strategies enhance the efficiency and accuracy of the retrieval process.
CFIC was tested on LongBench tasks, including single-document and multi-document QA. The results verify the effectiveness of the method. Contributions include proposing a chunking-free in-context retrieval method for RAG systems, enhancing the CFIC model through supervised fine-tuning, and designing two decoding strategies that significantly improve the efficiency and accuracy of the CFIC's decoding process.
The paper discusses related work, including RAG frameworks, retrieval methods, and generation techniques. It highlights challenges in processing long and noisy contexts and introduces CFIC as a more efficient and effective solution. The method is evaluated on various datasets, demonstrating its effectiveness in retrieving precise evidence for QA tasks. Limitations include the model's reliance on self-constructed training data and the maximum token length constraint. Ethical considerations include potential biases in training data and the need for bias mitigation strategies. Future work aims to address these limitations and enhance the model's applicability.This paper introduces a novel Chunking-Free In-Context (CFIC) retrieval approach for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems face challenges in grounding responses with precise evidence due to the difficulty of processing long documents and filtering out irrelevant content. Common solutions like document chunking and adapting language models to handle longer contexts have limitations, often disrupting semantic coherence or failing to address noise and inaccuracies in evidence retrieval.
CFIC addresses these challenges by bypassing the conventional chunking process. It uses the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies: Constrained Sentence Prefix Decoding and Skip Decoding. These strategies improve retrieval efficiency and maintain the fidelity of the generated grounding text evidence.
Evaluations on open QA datasets show CFIC's superiority in retrieving relevant and accurate evidence, offering significant improvements over traditional methods. By avoiding document chunking, CFIC provides a more streamlined, effective, and efficient retrieval solution for RAG systems.
The CFIC method involves encoding a document into transformer hidden states. When a user query is input, CFIC continues to encode the query alongside task instructions following the hidden states, subsequently generating grounding text. It leverages the document's encoded hidden states for in-context retrieval, allowing the retrieval system to auto-aggressively decode and pinpoint the precise evidence text to ground the response generation.
CFIC incorporates two decoding strategies: Constrained Sentence Prefix Decoding, which uses sentence prefixes as decoding candidates to shift the model's decision boundary from open-ended to document-dependent generation, and Skip Decoding, which bypasses decoding intermediate tokens and directly selects sentence ends with the highest likelihood of the [eos] token. These strategies enhance the efficiency and accuracy of the retrieval process.
CFIC was tested on LongBench tasks, including single-document and multi-document QA. The results verify the effectiveness of the method. Contributions include proposing a chunking-free in-context retrieval method for RAG systems, enhancing the CFIC model through supervised fine-tuning, and designing two decoding strategies that significantly improve the efficiency and accuracy of the CFIC's decoding process.
The paper discusses related work, including RAG frameworks, retrieval methods, and generation techniques. It highlights challenges in processing long and noisy contexts and introduces CFIC as a more efficient and effective solution. The method is evaluated on various datasets, demonstrating its effectiveness in retrieving precise evidence for QA tasks. Limitations include the model's reliance on self-constructed training data and the maximum token length constraint. Ethical considerations include potential biases in training data and the need for bias mitigation strategies. Future work aims to address these limitations and enhance the model's applicability.