22 May 2024 | Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Chankyu Lee, Mohammad Shoeibi, Bryan Catanzaro
This paper introduces ChatQA, a suite of models that outperform GPT-4 in retrieval-augmented generation (RAG) and conversational question answering (QA). The authors propose a two-stage instruction tuning method to enhance generation performance and a dense retriever optimized for conversational QA, which performs similarly to state-of-the-art query rewriting models while reducing deployment costs. They also present CHATRAG BENCH, a comprehensive benchmark with ten datasets covering various QA scenarios. The ChatQA-1.0-70B model, built on Llama2, slightly outperforms GPT-4-0613 and GPT-4-Turbo without using synthetic data from OpenAI GPT models. Notably, the Llama3-ChatQA-1.5-70B model surpasses GPT-4-Turbo in all categories. The paper also discusses the importance of incorporating "unanswerable" samples to improve the model's handling of such scenarios. The authors open-source the model weights, instruction tuning data, CHATRAG BENCH, and the retriever to advance research in this field.This paper introduces ChatQA, a suite of models that outperform GPT-4 in retrieval-augmented generation (RAG) and conversational question answering (QA). The authors propose a two-stage instruction tuning method to enhance generation performance and a dense retriever optimized for conversational QA, which performs similarly to state-of-the-art query rewriting models while reducing deployment costs. They also present CHATRAG BENCH, a comprehensive benchmark with ten datasets covering various QA scenarios. The ChatQA-1.0-70B model, built on Llama2, slightly outperforms GPT-4-0613 and GPT-4-Turbo without using synthetic data from OpenAI GPT models. Notably, the Llama3-ChatQA-1.5-70B model surpasses GPT-4-Turbo in all categories. The paper also discusses the importance of incorporating "unanswerable" samples to improve the model's handling of such scenarios. The authors open-source the model weights, instruction tuning data, CHATRAG BENCH, and the retriever to advance research in this field.