31 Jan 2024 | Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
**Abstract:**
Retrieval-augmented language models can better adapt to changes in the world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting their ability to understand the overall document context comprehensively. To address this, we introduce RAPTOR, a novel approach that recursively embeds, clusters, and summarizes text chunks, constructing a tree with varying levels of summarization from the bottom up. At inference time, RAPTOR retrieves information from this tree, integrating context across lengthy documents at different levels of abstraction. Controlled experiments show that RAPTOR outperforms traditional retrieval-augmented language models on several tasks, achieving state-of-the-art results on question-answering tasks involving complex, multi-step reasoning. For example, when coupled with GPT-4, RAPTOR improves the best performance on the QUALITY benchmark by 20% in absolute accuracy.
**Introduction:**
Large Language Models (LLMs) have emerged as transformative tools, showing impressive performance on various tasks. However, they often lack domain-specific knowledge and can be updated through fine-tuning or editing, which is challenging, especially with vast text corpora. Retrieval-augmented language models (RALMs) address this by indexing large amounts of text and presenting it to LLMs as context. However, existing methods retrieve only short, contiguous text chunks, limiting their ability to capture large-scale discourse structures.
**RAPTOR:**
RAPTOR addresses this by using a tree structure to capture both high-level and low-level details about a text. The system clusters text chunks, generates summaries of these clusters, and repeats this process to build a multi-layered tree. This allows RAPTOR to load context chunks representing the text at different levels, enabling more effective and efficient question answering.
**Methods:**
RAPTOR uses a clustering algorithm based on Gaussian Mixture Models (GMMs) to group similar text chunks. The GMMs offer flexibility and a probabilistic framework, allowing nodes to belong to multiple clusters. The clustering algorithm employs UMAP for dimensionality reduction, capturing broad themes and specific details. After clustering, a language model (e.g., GPT-3.5-turbo) is used to generate summaries of the grouped texts. The summarization step condenses large volumes of retrieved information into manageable sizes.
**Querying:**
RAPTOR employs two querying mechanisms: tree traversal and collapsed tree. Tree traversal traverses the tree layer-by-layer, selecting the most relevant nodes at each level. Collapsed tree evaluates nodes across all layers to find the most relevant ones. Both methods offer unique advantages and trade-offs, with collapsed tree providing greater flexibility and better performance on datasets like QASPER and QuALITY.
**Experiments:**
RAPTOR is evaluated on three question-answering datasets: NarrativeQA,RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
**Abstract:**
Retrieval-augmented language models can better adapt to changes in the world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting their ability to understand the overall document context comprehensively. To address this, we introduce RAPTOR, a novel approach that recursively embeds, clusters, and summarizes text chunks, constructing a tree with varying levels of summarization from the bottom up. At inference time, RAPTOR retrieves information from this tree, integrating context across lengthy documents at different levels of abstraction. Controlled experiments show that RAPTOR outperforms traditional retrieval-augmented language models on several tasks, achieving state-of-the-art results on question-answering tasks involving complex, multi-step reasoning. For example, when coupled with GPT-4, RAPTOR improves the best performance on the QUALITY benchmark by 20% in absolute accuracy.
**Introduction:**
Large Language Models (LLMs) have emerged as transformative tools, showing impressive performance on various tasks. However, they often lack domain-specific knowledge and can be updated through fine-tuning or editing, which is challenging, especially with vast text corpora. Retrieval-augmented language models (RALMs) address this by indexing large amounts of text and presenting it to LLMs as context. However, existing methods retrieve only short, contiguous text chunks, limiting their ability to capture large-scale discourse structures.
**RAPTOR:**
RAPTOR addresses this by using a tree structure to capture both high-level and low-level details about a text. The system clusters text chunks, generates summaries of these clusters, and repeats this process to build a multi-layered tree. This allows RAPTOR to load context chunks representing the text at different levels, enabling more effective and efficient question answering.
**Methods:**
RAPTOR uses a clustering algorithm based on Gaussian Mixture Models (GMMs) to group similar text chunks. The GMMs offer flexibility and a probabilistic framework, allowing nodes to belong to multiple clusters. The clustering algorithm employs UMAP for dimensionality reduction, capturing broad themes and specific details. After clustering, a language model (e.g., GPT-3.5-turbo) is used to generate summaries of the grouped texts. The summarization step condenses large volumes of retrieved information into manageable sizes.
**Querying:**
RAPTOR employs two querying mechanisms: tree traversal and collapsed tree. Tree traversal traverses the tree layer-by-layer, selecting the most relevant nodes at each level. Collapsed tree evaluates nodes across all layers to find the most relevant ones. Both methods offer unique advantages and trade-offs, with collapsed tree providing greater flexibility and better performance on datasets like QASPER and QuALITY.
**Experiments:**
RAPTOR is evaluated on three question-answering datasets: NarrativeQA,