[slides and audio] Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

This paper introduces a new threat to LLM-powered applications, called retrieval poisoning, where attackers can manipulate the application to generate malicious responses during the retrieval augmented generation (RAG) process. The attack involves crafting documents that appear benign to humans but contain hidden attack sequences that can mislead the LLMs. These documents are then used as reference sources in the RAG process, leading the application to generate incorrect responses. The attack exploits the design features of LLM application frameworks, allowing attackers to invisibly embed attack sequences in external documents. Despite the documents providing correct information, once they are used as reference sources, the application is misled into generating incorrect responses. The paper presents a detailed approach for retrieval poisoning, including the use of a gradient-guided mutation technique to generate effective attack sequences. The attack is demonstrated through both preliminary experiments and a real-world experiment on a widely-used LLM-powered application, achieving high success rates. The paper also discusses the difference between retrieval poisoning and prompt injection, highlighting that retrieval poisoning is more imperceptible and can bypass advanced instruction filtering methods. Potential defenses are suggested, such as displaying the source content underlying responses to allow users to cross-reference the content with the response. The results show that retrieval poisoning is highly effective, with a success rate of 88.33% in preliminary experiments and 66.67% in real-world applications. The attack can be performed on various LLMs and is not limited to a specific augmented request, indicating its broad applicability. The paper emphasizes the need for further research into the security of LLM-powered applications and the development of more effective mitigation strategies.This paper introduces a new threat to LLM-powered applications, called retrieval poisoning, where attackers can manipulate the application to generate malicious responses during the retrieval augmented generation (RAG) process. The attack involves crafting documents that appear benign to humans but contain hidden attack sequences that can mislead the LLMs. These documents are then used as reference sources in the RAG process, leading the application to generate incorrect responses. The attack exploits the design features of LLM application frameworks, allowing attackers to invisibly embed attack sequences in external documents. Despite the documents providing correct information, once they are used as reference sources, the application is misled into generating incorrect responses. The paper presents a detailed approach for retrieval poisoning, including the use of a gradient-guided mutation technique to generate effective attack sequences. The attack is demonstrated through both preliminary experiments and a real-world experiment on a widely-used LLM-powered application, achieving high success rates. The paper also discusses the difference between retrieval poisoning and prompt injection, highlighting that retrieval poisoning is more imperceptible and can bypass advanced instruction filtering methods. Potential defenses are suggested, such as displaying the source content underlying responses to allow users to cross-reference the content with the response. The results show that retrieval poisoning is highly effective, with a success rate of 88.33% in preliminary experiments and 66.67% in real-world applications. The attack can be performed on various LLMs and is not limited to a specific augmented request, indicating its broad applicability. The paper emphasizes the need for further research into the security of LLM-powered applications and the development of more effective mitigation strategies.

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

26 Apr 2024 | Quan Zhang, Binqi Zeng, Chijin Zhou, Gwhiwan Go, Heyuan Shi, Yu Jiang