BadRAG identifies vulnerabilities in Retrieval-Augmented Generation (RAG) systems used in large language models (LLMs). RAG combines retrieval-based methods with generative models to enhance the accuracy and relevance of LLM responses. However, RAG introduces new security risks, as its databases often contain public data, making them susceptible to attacks. BadRAG focuses on identifying vulnerabilities in both the retrieval and generative phases of RAG systems.
The paper proposes BadRAG to detect attacks on RAG databases and their indirect effects on LLMs. It demonstrates that poisoning a small number of customized content passages can create a retrieval backdoor, where the retrieval system responds to specific queries with poisoned content. These attacks can include denial-of-service (DoS) attacks on RAG and semantic steering attacks on LLMs. Experiments show that poisoning just 10 adversarial passages (0.04% of the total corpus) can lead to a 98.2% success rate in retrieving these passages. This can significantly increase the rejection rate of RAG-based GPT-4 from 0.01% to 74.6% or the rate of negative responses from 0.22% to 72% for targeted queries.
BadRAG introduces methods to craft adversarial passages that can be retrieved by specific triggers, such as semantic groups like "The Republican Party, Donald Trump, etc." These passages can be tailored to different contents and used to indirectly attack LLMs without modifying them. The paper also proposes two methods for generative attacks: Alignment as an Attack (AaaS) and Selective-Fact as an Attack (SFaaS). AaaS demonstrates how to craft passages to cause DoS attacks on LLMs, while SFaaS shows how to craft passages to steer the sentiment of LLM responses.
The paper evaluates BadRAG on various datasets and models, including GPT-4 and Claude-3, showing its effectiveness in targeting LLM outputs. It highlights the need for robust countermeasures against adversarial attacks in RAG systems. The study also discusses potential defense strategies, such as removing trigger words from queries to prevent retrieval of adversarial passages. The research underscores the importance of developing advanced defense mechanisms to enhance the safety and robustness of AI technologies.BadRAG identifies vulnerabilities in Retrieval-Augmented Generation (RAG) systems used in large language models (LLMs). RAG combines retrieval-based methods with generative models to enhance the accuracy and relevance of LLM responses. However, RAG introduces new security risks, as its databases often contain public data, making them susceptible to attacks. BadRAG focuses on identifying vulnerabilities in both the retrieval and generative phases of RAG systems.
The paper proposes BadRAG to detect attacks on RAG databases and their indirect effects on LLMs. It demonstrates that poisoning a small number of customized content passages can create a retrieval backdoor, where the retrieval system responds to specific queries with poisoned content. These attacks can include denial-of-service (DoS) attacks on RAG and semantic steering attacks on LLMs. Experiments show that poisoning just 10 adversarial passages (0.04% of the total corpus) can lead to a 98.2% success rate in retrieving these passages. This can significantly increase the rejection rate of RAG-based GPT-4 from 0.01% to 74.6% or the rate of negative responses from 0.22% to 72% for targeted queries.
BadRAG introduces methods to craft adversarial passages that can be retrieved by specific triggers, such as semantic groups like "The Republican Party, Donald Trump, etc." These passages can be tailored to different contents and used to indirectly attack LLMs without modifying them. The paper also proposes two methods for generative attacks: Alignment as an Attack (AaaS) and Selective-Fact as an Attack (SFaaS). AaaS demonstrates how to craft passages to cause DoS attacks on LLMs, while SFaaS shows how to craft passages to steer the sentiment of LLM responses.
The paper evaluates BadRAG on various datasets and models, including GPT-4 and Claude-3, showing its effectiveness in targeting LLM outputs. It highlights the need for robust countermeasures against adversarial attacks in RAG systems. The study also discusses potential defense strategies, such as removing trigger words from queries to prevent retrieval of adversarial passages. The research underscores the importance of developing advanced defense mechanisms to enhance the safety and robustness of AI technologies.