Reinforcement Learning for Optimizing RAG for Domain Chatbots

Reinforcement Learning for Optimizing RAG for Domain Chatbots

10 Jan 2024 | Mandar Kulkarni, Praveen Tangarajan, Kyung Kim, Anusua Trivedi
This paper presents a Reinforcement Learning (RL)-based approach to optimize Retrieval Augmented Generation (RAG) for domain chatbots, specifically for answering credit card application-related queries. The authors train an in-house retrieval embedding model using infoNCE loss, which outperforms a general-purpose public embedding model in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. They use an open API-based paid ChatGPT model as the LLM and propose a policy gradient-based approach to optimize the number of LLM tokens, reducing costs without compromising accuracy. The policy model, external to the RAG pipeline, can take two actions: fetch FAQ context or skip retrieval. The effectiveness of the proposed approach is demonstrated through experimental results, showing a significant cost savings of ~31% while achieving slightly improved accuracy. The paper also discusses related works and evaluates the quality of bot responses using GPT-4 ratings. The proposed RL-based optimization is generic and can be applied to any existing RAG pipeline.This paper presents a Reinforcement Learning (RL)-based approach to optimize Retrieval Augmented Generation (RAG) for domain chatbots, specifically for answering credit card application-related queries. The authors train an in-house retrieval embedding model using infoNCE loss, which outperforms a general-purpose public embedding model in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. They use an open API-based paid ChatGPT model as the LLM and propose a policy gradient-based approach to optimize the number of LLM tokens, reducing costs without compromising accuracy. The policy model, external to the RAG pipeline, can take two actions: fetch FAQ context or skip retrieval. The effectiveness of the proposed approach is demonstrated through experimental results, showing a significant cost savings of ~31% while achieving slightly improved accuracy. The paper also discusses related works and evaluates the quality of bot responses using GPT-4 ratings. The proposed RL-based optimization is generic and can be applied to any existing RAG pipeline.
Reach us at info@study.space