WIPl is a new web threat that exploits Web Agents through malicious instructions embedded in publicly accessible webpages. This threat, called Web Indirect Prompt Injection (WIPI), allows attackers to indirectly control Web Agents to execute malicious instructions without user authorization. The attack is designed to be stealthy and efficient, leveraging the natural language processing capabilities of large language models (LLMs) to bypass traditional web security measures. The attack involves two main steps: retrieval and execution. During retrieval, the Web Agent calls web tools to fetch content from external websites, which may include malicious instructions. During execution, the Web Agent processes this content and may execute the malicious instructions.
To ensure the attack's success, the researchers designed a universal template with strategies to bypass potential defenses and ensure the Web Agent focuses on the malicious instructions. These strategies include preset instruction negligence, confirmation requests, and multi-level repetition. The template is designed to be imperceptible to users, making the attack more stealthy. The researchers conducted extensive experiments using various Web Agents, including ChatGPT, Web GPTs, and open-source agents, demonstrating that the attack achieves an average success rate of over 90% in black-box scenarios. The results show that the attack is robust and effective across different user prefix instructions and web tools.
The study also evaluated the stealthiness of the attack by testing it against traditional web security tools like VirusTotal and IPQS. The results indicate that WIP1 is highly stealthy and cannot be detected by these tools. The researchers also conducted case studies to demonstrate the potential security threats posed by WIP1, including phishing, identity theft, and malware infections. These findings highlight the importance of developing more secure Web Agents to mitigate the risks associated with this new type of web threat.WIPl is a new web threat that exploits Web Agents through malicious instructions embedded in publicly accessible webpages. This threat, called Web Indirect Prompt Injection (WIPI), allows attackers to indirectly control Web Agents to execute malicious instructions without user authorization. The attack is designed to be stealthy and efficient, leveraging the natural language processing capabilities of large language models (LLMs) to bypass traditional web security measures. The attack involves two main steps: retrieval and execution. During retrieval, the Web Agent calls web tools to fetch content from external websites, which may include malicious instructions. During execution, the Web Agent processes this content and may execute the malicious instructions.
To ensure the attack's success, the researchers designed a universal template with strategies to bypass potential defenses and ensure the Web Agent focuses on the malicious instructions. These strategies include preset instruction negligence, confirmation requests, and multi-level repetition. The template is designed to be imperceptible to users, making the attack more stealthy. The researchers conducted extensive experiments using various Web Agents, including ChatGPT, Web GPTs, and open-source agents, demonstrating that the attack achieves an average success rate of over 90% in black-box scenarios. The results show that the attack is robust and effective across different user prefix instructions and web tools.
The study also evaluated the stealthiness of the attack by testing it against traditional web security tools like VirusTotal and IPQS. The results indicate that WIP1 is highly stealthy and cannot be detected by these tools. The researchers also conducted case studies to demonstrate the potential security threats posed by WIP1, including phishing, identity theft, and malware infections. These findings highlight the importance of developing more secure Web Agents to mitigate the risks associated with this new type of web threat.