Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

30 May 2024 | Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea
Phantom is a two-step attack framework that exploits vulnerabilities in Retrieval Augmented Generation (RAG) systems. The first step involves crafting a poisoned document that is retrieved by the RAG system only when a specific adversarial trigger is present in the user's query. The second step uses a specially crafted adversarial string within the poisoned document to trigger various adversarial attacks on the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. The attack is demonstrated on multiple LLM architectures, including Gemma, Vicuna, and Llama. The framework is designed to be effective against a wide range of adversarial objectives, such as generating biased opinions, exfiltrating sensitive information, and producing harmful text. The attack is evaluated on various models and shows high success rates in achieving these objectives. The work highlights the security risks associated with RAG systems and the potential for malicious users to inject poisoned documents into the knowledge database to manipulate the LLM generation process. The study also discusses the limitations of the attack and suggests possible defensive approaches. The research contributes to the field of adversarial machine learning by providing a new method for attacking RAG systems and demonstrating the effectiveness of the Phantom framework in achieving adversarial objectives.Phantom is a two-step attack framework that exploits vulnerabilities in Retrieval Augmented Generation (RAG) systems. The first step involves crafting a poisoned document that is retrieved by the RAG system only when a specific adversarial trigger is present in the user's query. The second step uses a specially crafted adversarial string within the poisoned document to trigger various adversarial attacks on the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. The attack is demonstrated on multiple LLM architectures, including Gemma, Vicuna, and Llama. The framework is designed to be effective against a wide range of adversarial objectives, such as generating biased opinions, exfiltrating sensitive information, and producing harmful text. The attack is evaluated on various models and shows high success rates in achieving these objectives. The work highlights the security risks associated with RAG systems and the potential for malicious users to inject poisoned documents into the knowledge database to manipulate the LLM generation process. The study also discusses the limitations of the attack and suggests possible defensive approaches. The research contributes to the field of adversarial machine learning by providing a new method for attacking RAG systems and demonstrating the effectiveness of the Phantom framework in achieving adversarial objectives.
Reach us at info@study.space
Understanding Phantom%3A General Trigger Attacks on Retrieval Augmented Language Generation