July 14-18, 2024 | Hamed Zamani, Michael Bendersky
This paper introduces STOCHASTIC RAG, a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence. STOCHASTIC RAG casts the retrieval process as a stochastic sampling without replacement process, enabling effective end-to-end optimization through straight-through Gumbel-top-k. The method is applied to a recent and effective RAG model, achieving state-of-the-art results on six out of seven datasets. The framework maximizes stochastic expected utility, where the utility can be any arbitrary evaluation metric for the downstream generation task. The approach is evaluated on seven diverse datasets, including open-domain question answering, fact verification, slot-filling for relation extraction, and dialogue systems. Results show significant improvements on all these datasets. The method addresses the challenge of non-differentiable ranking and top-k selection in RAG systems by casting retrieval as a sampling without replacement process. The paper also discusses the experimental setup, results, and future work, highlighting the effectiveness of the proposed approach in improving RAG performance.This paper introduces STOCHASTIC RAG, a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence. STOCHASTIC RAG casts the retrieval process as a stochastic sampling without replacement process, enabling effective end-to-end optimization through straight-through Gumbel-top-k. The method is applied to a recent and effective RAG model, achieving state-of-the-art results on six out of seven datasets. The framework maximizes stochastic expected utility, where the utility can be any arbitrary evaluation metric for the downstream generation task. The approach is evaluated on seven diverse datasets, including open-domain question answering, fact verification, slot-filling for relation extraction, and dialogue systems. Results show significant improvements on all these datasets. The method addresses the challenge of non-differentiable ranking and top-k selection in RAG systems by casting retrieval as a sampling without replacement process. The paper also discusses the experimental setup, results, and future work, highlighting the effectiveness of the proposed approach in improving RAG performance.