26 Jun 2024 | Zhen Tan*, Chengshuai Zhao**, Raha Moraffah*, Yifan Li*, Song Wang*, Jundong Li*, Tianlong Chen *, Huan Liu*
The paper "Glue Pizza and Eat Rocks" by Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, and Huan Liu, explores the security vulnerabilities in Retrieval-Augmented Generative (RAG) models. RAG models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving performance in tasks like fact-checking and information searching. However, the openness of these knowledge bases poses a significant security threat. The authors demonstrate that adversaries can exploit this openness by injecting deceptive content into the retrieval database, intentionally altering the model's behavior. This threat is particularly critical in real-world scenarios where RAG systems interact with publicly accessible knowledge bases.
The paper focuses on a realistic gray-box setting where the adversary has no knowledge of user queries, knowledge base data, or LLM parameters but can influence the system through the retriever. The authors show that it is possible to successfully exploit the model by uploading crafted content to the retriever. They introduce the LIAR (Exploitative Bi-level RAG Training) framework, which effectively generates adversarial content to influence RAG systems to produce misleading responses. The framework decouples the structure and objective of attacking the retriever and the LLM generator, addressing the challenges of joint optimization.
Experiments conducted on various RAG systems and models reveal the effectiveness of LIAR in generating adversarial content that significantly impacts the retrieval and generative components. The results highlight the need for robust defense mechanisms to protect against such attacks, ensuring the integrity and reliability of RAG models in practical applications. The paper concludes with discussions on the limitations and future directions, emphasizing the importance of developing more effective defense strategies and expanding the methodology to address multimodal contexts.The paper "Glue Pizza and Eat Rocks" by Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, and Huan Liu, explores the security vulnerabilities in Retrieval-Augmented Generative (RAG) models. RAG models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving performance in tasks like fact-checking and information searching. However, the openness of these knowledge bases poses a significant security threat. The authors demonstrate that adversaries can exploit this openness by injecting deceptive content into the retrieval database, intentionally altering the model's behavior. This threat is particularly critical in real-world scenarios where RAG systems interact with publicly accessible knowledge bases.
The paper focuses on a realistic gray-box setting where the adversary has no knowledge of user queries, knowledge base data, or LLM parameters but can influence the system through the retriever. The authors show that it is possible to successfully exploit the model by uploading crafted content to the retriever. They introduce the LIAR (Exploitative Bi-level RAG Training) framework, which effectively generates adversarial content to influence RAG systems to produce misleading responses. The framework decouples the structure and objective of attacking the retriever and the LLM generator, addressing the challenges of joint optimization.
Experiments conducted on various RAG systems and models reveal the effectiveness of LIAR in generating adversarial content that significantly impacts the retrieval and generative components. The results highlight the need for robust defense mechanisms to protect against such attacks, ensuring the integrity and reliability of RAG models in practical applications. The paper concludes with discussions on the limitations and future directions, emphasizing the importance of developing more effective defense strategies and expanding the methodology to address multimodal contexts.