Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

26 Apr 2024 | Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang
This paper presents a new threat to LLM-powered applications, called retrieval poisoning, where attackers can manipulate the application to generate malicious responses during the retrieval augmented generation (RAG) process. Attackers can craft documents that are visually indistinguishable from benign ones, which, when used as reference sources for RAG, can mislead the application into generating incorrect responses. The attackers exploit the design features of LLM application frameworks to imperceptibly embed attack sequences in external documents, ensuring these sequences are retrieved and integrated into augmented requests. A gradient-guided mutation technique is introduced to generate effective attack sequences. The attack scenario involves users unknowingly referencing documents crafted by attackers, which contain invisible attack sequences. These sequences manipulate the LLM into generating responses with incorrect information. Preliminary experiments show that attackers can successfully mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in real-world applications, demonstrating the potential impact of retrieval poisoning. The methodology involves analyzing LLM application frameworks to identify exploitable features, such as document parsers, text splitters, and prompt templates. Attackers can exploit these components to invisibly inject attack sequences into documents. The attack sequence is generated using a gradient-guided mutation technique, which ensures the sequence is effective in manipulating the LLM's response. The paper also presents preliminary experiments showing the effectiveness of retrieval poisoning across different LLMs and real-world applications. The results indicate that retrieval poisoning is a significant threat to LLM-powered applications, necessitating the development of more effective mitigation strategies. The paper highlights the need for further research into the intricacies of LLM application frameworks to enhance the security and reliability of LLM-powered applications.This paper presents a new threat to LLM-powered applications, called retrieval poisoning, where attackers can manipulate the application to generate malicious responses during the retrieval augmented generation (RAG) process. Attackers can craft documents that are visually indistinguishable from benign ones, which, when used as reference sources for RAG, can mislead the application into generating incorrect responses. The attackers exploit the design features of LLM application frameworks to imperceptibly embed attack sequences in external documents, ensuring these sequences are retrieved and integrated into augmented requests. A gradient-guided mutation technique is introduced to generate effective attack sequences. The attack scenario involves users unknowingly referencing documents crafted by attackers, which contain invisible attack sequences. These sequences manipulate the LLM into generating responses with incorrect information. Preliminary experiments show that attackers can successfully mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in real-world applications, demonstrating the potential impact of retrieval poisoning. The methodology involves analyzing LLM application frameworks to identify exploitable features, such as document parsers, text splitters, and prompt templates. Attackers can exploit these components to invisibly inject attack sequences into documents. The attack sequence is generated using a gradient-guided mutation technique, which ensures the sequence is effective in manipulating the LLM's response. The paper also presents preliminary experiments showing the effectiveness of retrieval poisoning across different LLMs and real-world applications. The results indicate that retrieval poisoning is a significant threat to LLM-powered applications, necessitating the development of more effective mitigation strategies. The paper highlights the need for further research into the intricacies of LLM application frameworks to enhance the security and reliability of LLM-powered applications.
Reach us at info@study.space