Understanding Certifiably Robust RAG against Retrieval Corruption

This paper introduces RobustRAG, a novel defense framework against retrieval corruption attacks in retrieval-augmented generation (RAG). The key idea of RobustRAG is an isolate-then-aggregate strategy: it generates responses from each passage in isolation and then securely aggregates these responses to produce a final output. This approach ensures that malicious passages cannot affect the responses of other benign passages, enabling robustness. RobustRAG achieves certifiable robustness, meaning that for certain queries, it can always return accurate responses even when the attacker has full knowledge of the defense and can inject a small number of malicious passages. The paper proposes two secure aggregation techniques for RobustRAG: keyword aggregation and decoding aggregation. These techniques allow RobustRAG to handle various knowledge-intensive tasks, including open-domain QA and long-form text generation. The framework is evaluated on multiple datasets and shows effectiveness across different tasks and datasets. The results demonstrate that RobustRAG maintains high clean performance while achieving certifiable robustness against retrieval corruption attacks. The paper also discusses the impact of various parameters on the performance of RobustRAG, showing that increasing the number of retrieved passages improves certifiable robustness, while larger corruption sizes reduce it. Additionally, the paper analyzes the effect of different decoding thresholds on the robustness of RobustRAG. The results show that RobustRAG is effective against both prompt injection and data poisoning attacks, with attack success rates below 10% in most cases. The paper concludes that RobustRAG is the first RAG defense framework that is certifiably robust against retrieval corruption attacks.This paper introduces RobustRAG, a novel defense framework against retrieval corruption attacks in retrieval-augmented generation (RAG). The key idea of RobustRAG is an isolate-then-aggregate strategy: it generates responses from each passage in isolation and then securely aggregates these responses to produce a final output. This approach ensures that malicious passages cannot affect the responses of other benign passages, enabling robustness. RobustRAG achieves certifiable robustness, meaning that for certain queries, it can always return accurate responses even when the attacker has full knowledge of the defense and can inject a small number of malicious passages. The paper proposes two secure aggregation techniques for RobustRAG: keyword aggregation and decoding aggregation. These techniques allow RobustRAG to handle various knowledge-intensive tasks, including open-domain QA and long-form text generation. The framework is evaluated on multiple datasets and shows effectiveness across different tasks and datasets. The results demonstrate that RobustRAG maintains high clean performance while achieving certifiable robustness against retrieval corruption attacks. The paper also discusses the impact of various parameters on the performance of RobustRAG, showing that increasing the number of retrieved passages improves certifiable robustness, while larger corruption sizes reduce it. Additionally, the paper analyzes the effect of different decoding thresholds on the robustness of RobustRAG. The results show that RobustRAG is effective against both prompt injection and data poisoning attacks, with attack success rates below 10% in most cases. The paper concludes that RobustRAG is the first RAG defense framework that is certifiably robust against retrieval corruption attacks.

Certifiably Robust RAG against Retrieval Corruption

May 24, 2024 | Chong Xiang*, Tong Wu*, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal

May 24, 2024 | Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal