31 May 2024 | Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu
This paper addresses the challenge of enhancing the noise robustness of Retrieval-Augmented Language Models (RALMs) by introducing a novel approach called Retrieval-augmented Adaptive Adversarial Training (RAAT). RALMs, which integrate external database knowledge to improve performance, often suffer from the issue of retrieving noisy or irrelevant information, leading to degraded performance. The authors categorize retrieval noises into three types: Relevant, Irrelevant, and Counterfactual, reflecting real-world scenarios. They then propose RAAT, which combines adaptive adversarial training and multi-task learning to dynamically adjust the training process and improve the model's ability to recognize and handle noisy contexts. Extensive experiments on the LLaMA-2 7B model demonstrate significant improvements in F1 and EM scores under various noise conditions. The paper also introduces a benchmark (RAG-Bench) to evaluate the noise robustness of RALMs using three open-domain question-answering datasets. The results show that RAAT outperforms existing methods, achieving better robustness against different types of retrieval noises. The authors discuss the limitations of their work and suggest future directions, including expanding the benchmark and exploring joint training of large language models and retrieval models.This paper addresses the challenge of enhancing the noise robustness of Retrieval-Augmented Language Models (RALMs) by introducing a novel approach called Retrieval-augmented Adaptive Adversarial Training (RAAT). RALMs, which integrate external database knowledge to improve performance, often suffer from the issue of retrieving noisy or irrelevant information, leading to degraded performance. The authors categorize retrieval noises into three types: Relevant, Irrelevant, and Counterfactual, reflecting real-world scenarios. They then propose RAAT, which combines adaptive adversarial training and multi-task learning to dynamically adjust the training process and improve the model's ability to recognize and handle noisy contexts. Extensive experiments on the LLaMA-2 7B model demonstrate significant improvements in F1 and EM scores under various noise conditions. The paper also introduces a benchmark (RAG-Bench) to evaluate the noise robustness of RALMs using three open-domain question-answering datasets. The results show that RAAT outperforms existing methods, achieving better robustness against different types of retrieval noises. The authors discuss the limitations of their work and suggest future directions, including expanding the benchmark and exploring joint training of large language models and retrieval models.