[slides and audio] Machine Against the RAG%3A Jamming Retrieval-Augmented Generation with Blocker Documents

The paper "Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents" by Avital Shafran explores a new class of denial-of-service attacks on Retrieval-Augmented Generation (RAG) systems. RAG systems combine large language models (LLMs) with knowledge databases to respond to queries by retrieving and generating answers from relevant documents. The authors demonstrate that RAG systems can be vulnerable to "jamming" attacks, where an adversary can add a single "blocker" document to the database that prevents the system from answering specific queries, either by lacking information or due to safety concerns. The paper describes several methods for generating blocker documents, including a new black-box optimization technique that does not require the adversary to know the embedding or LLM used by the target RAG system. The efficacy of these methods is evaluated against various LLMs and embeddings, showing that existing safety metrics for LLMs do not capture their vulnerability to jamming attacks. Higher safety scores are correlated with higher vulnerability to jamming. The authors also discuss defenses against jamming attacks, including perplexity-based filtering, query or document paraphrasing, and increasing context size. They find that perplexity-based detection is effective but can be circumvented by incorporating "naturalness" constraints in the adversary's optimization. Overall, the paper highlights the need for new trustworthiness properties in LLMs, such as resistance to jamming attacks, and provides insights into the vulnerabilities and defenses of RAG systems.The paper "Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents" by Avital Shafran explores a new class of denial-of-service attacks on Retrieval-Augmented Generation (RAG) systems. RAG systems combine large language models (LLMs) with knowledge databases to respond to queries by retrieving and generating answers from relevant documents. The authors demonstrate that RAG systems can be vulnerable to "jamming" attacks, where an adversary can add a single "blocker" document to the database that prevents the system from answering specific queries, either by lacking information or due to safety concerns. The paper describes several methods for generating blocker documents, including a new black-box optimization technique that does not require the adversary to know the embedding or LLM used by the target RAG system. The efficacy of these methods is evaluated against various LLMs and embeddings, showing that existing safety metrics for LLMs do not capture their vulnerability to jamming attacks. Higher safety scores are correlated with higher vulnerability to jamming. The authors also discuss defenses against jamming attacks, including perplexity-based filtering, query or document paraphrasing, and increasing context size. They find that perplexity-based detection is effective but can be circumvented by incorporating "naturalness" constraints in the adversary's optimization. Overall, the paper highlights the need for new trustworthiness properties in LLMs, such as resistance to jamming attacks, and provides insights into the vulnerabilities and defenses of RAG systems.

Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

9 Jun 2024 | Avital Shafran, Roei Schuster, Vitaly Shmatikov