Reinforcement Learning for Optimizing RAG for Domain Chatbots

Reinforcement Learning for Optimizing RAG for Domain Chatbots

2024 | Mandar Kulkarni, Praveen Tangarajan, Kyung Kim, Anusua Trivedi
This paper presents a Reinforcement Learning (RL) approach for optimizing Retrieval Augmented Generation (RAG) in domain-specific chatbots. The goal is to reduce the number of tokens passed to the Large Language Model (LLM) while maintaining or improving the accuracy of the bot's responses. The approach is applied to a chatbot that answers user queries using Frequently Asked Questions (FAQ) data. The chatbot uses an in-house retrieval embedding model trained with the infoNCE loss, which outperforms a general-purpose public embedding model in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. The LLM used is an open API-based ChatGPT model. The paper introduces a policy-based RL approach to optimize the number of tokens passed to the LLM. The policy model interacts with the RAG pipeline to decide whether to fetch FAQ context or not, based on the current query and previous interactions. The policy model is trained using policy gradient methods, with rewards derived from evaluations by GPT-4. The reward function is designed to encourage the policy model to fetch context only when necessary, thereby reducing the number of tokens passed to the LLM. The policy model can take two actions: [FETCH] or [NO_FETCH]. When [FETCH] is chosen, the RAG pipeline is executed; when [NO_FETCH] is chosen, the current query is directly inputted to the LLM. The RL-based optimization combined with a similarity threshold leads to significant token savings (around 31%) while slightly improving the accuracy. The proposed approach is generic and can be applied to any existing RAG pipeline. The results show that the policy-based approach outperforms both the general-purpose public model and the gpt-2 model in terms of token savings. The paper also demonstrates that the RL-based approach is effective in reducing the cost of using LLMs for domain-specific chatbots.This paper presents a Reinforcement Learning (RL) approach for optimizing Retrieval Augmented Generation (RAG) in domain-specific chatbots. The goal is to reduce the number of tokens passed to the Large Language Model (LLM) while maintaining or improving the accuracy of the bot's responses. The approach is applied to a chatbot that answers user queries using Frequently Asked Questions (FAQ) data. The chatbot uses an in-house retrieval embedding model trained with the infoNCE loss, which outperforms a general-purpose public embedding model in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. The LLM used is an open API-based ChatGPT model. The paper introduces a policy-based RL approach to optimize the number of tokens passed to the LLM. The policy model interacts with the RAG pipeline to decide whether to fetch FAQ context or not, based on the current query and previous interactions. The policy model is trained using policy gradient methods, with rewards derived from evaluations by GPT-4. The reward function is designed to encourage the policy model to fetch context only when necessary, thereby reducing the number of tokens passed to the LLM. The policy model can take two actions: [FETCH] or [NO_FETCH]. When [FETCH] is chosen, the RAG pipeline is executed; when [NO_FETCH] is chosen, the current query is directly inputted to the LLM. The RL-based optimization combined with a similarity threshold leads to significant token savings (around 31%) while slightly improving the accuracy. The proposed approach is generic and can be applied to any existing RAG pipeline. The results show that the policy-based approach outperforms both the general-purpose public model and the gpt-2 model in terms of token savings. The paper also demonstrates that the RL-based approach is effective in reducing the cost of using LLMs for domain-specific chatbots.
Reach us at info@study.space
Understanding Reinforcement Learning for Optimizing RAG for Domain Chatbots