WIPI: A New Web Threat for LLM-Driven Web Agents

WIPI: A New Web Threat for LLM-Driven Web Agents

26 Feb 2024 | Fangzhou Wu, Shutong Wu, Yulong Cao, Chaowei Xiao
The paper introduces a novel web threat called Web Indirect Prompt Injection (WIPI), which allows attackers to control LLM-driven Web Agents by embedding malicious instructions in publicly accessible webpages. WIPI operates in a black-box environment, focusing on the form and content of indirect instructions within external webpages to enhance the efficiency and stealthiness of the attack. The authors conducted extensive experiments using 7 plugin-based ChatGPT Web Agents, 8 Web GPTs, and 3 open-source Web Agents, achieving an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios. Through an ablation study, they demonstrated that WIPI exhibits strong robustness across diverse prefix instructions. The paper highlights the vulnerability of current LLM-driven Web Agents to this new attack method and emphasizes the urgency to build more secure Web Agents.The paper introduces a novel web threat called Web Indirect Prompt Injection (WIPI), which allows attackers to control LLM-driven Web Agents by embedding malicious instructions in publicly accessible webpages. WIPI operates in a black-box environment, focusing on the form and content of indirect instructions within external webpages to enhance the efficiency and stealthiness of the attack. The authors conducted extensive experiments using 7 plugin-based ChatGPT Web Agents, 8 Web GPTs, and 3 open-source Web Agents, achieving an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios. Through an ablation study, they demonstrated that WIPI exhibits strong robustness across diverse prefix instructions. The paper highlights the vulnerability of current LLM-driven Web Agents to this new attack method and emphasizes the urgency to build more secure Web Agents.
Reach us at info@study.space