AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

18 Jul 2024 | Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr
AgentDojo is a dynamic evaluation framework designed to assess the robustness of AI agents against prompt injection attacks. It provides a realistic environment where agents interact with external tools and handle untrusted data. The framework includes 97 realistic tasks, such as managing emails, navigating e-banking websites, and making travel bookings, along with 629 security test cases and various attack and defense paradigms. AgentDojo is not a static test suite but an extensible environment that allows for the design and evaluation of new agent tasks, defenses, and adaptive attacks. The framework evaluates agents based on their ability to complete tasks in the absence of attacks and their resilience to prompt injection attacks. It measures the success rate of agents in completing tasks and the success rate of attackers in executing malicious actions. AgentDojo also evaluates the effectiveness of different defense mechanisms against prompt injections. The framework includes a variety of components, such as environments that simulate real-world applications, tools that allow agents to interact with these environments, and user and injection tasks that define the goals of agents and attackers. AgentDojo supports both benign and adversarial scenarios, allowing for the assessment of agent performance in realistic conditions. The evaluation of AgentDojo includes a range of metrics, such as benign utility, utility under attack, and targeted attack success rate. These metrics help to quantify the effectiveness of agents and defenses in handling prompt injection attacks. The framework also allows for the testing of different attack and defense strategies, providing insights into the challenges faced by both attackers and defenders. Overall, AgentDojo serves as a benchmarking environment for evaluating the robustness of AI agents against prompt injection attacks. It provides a dynamic and extensible framework that can be used to assess the performance of agents and defenses in a variety of scenarios. The framework is designed to foster research into new design principles for AI agents that can reliably and securely solve common tasks.AgentDojo is a dynamic evaluation framework designed to assess the robustness of AI agents against prompt injection attacks. It provides a realistic environment where agents interact with external tools and handle untrusted data. The framework includes 97 realistic tasks, such as managing emails, navigating e-banking websites, and making travel bookings, along with 629 security test cases and various attack and defense paradigms. AgentDojo is not a static test suite but an extensible environment that allows for the design and evaluation of new agent tasks, defenses, and adaptive attacks. The framework evaluates agents based on their ability to complete tasks in the absence of attacks and their resilience to prompt injection attacks. It measures the success rate of agents in completing tasks and the success rate of attackers in executing malicious actions. AgentDojo also evaluates the effectiveness of different defense mechanisms against prompt injections. The framework includes a variety of components, such as environments that simulate real-world applications, tools that allow agents to interact with these environments, and user and injection tasks that define the goals of agents and attackers. AgentDojo supports both benign and adversarial scenarios, allowing for the assessment of agent performance in realistic conditions. The evaluation of AgentDojo includes a range of metrics, such as benign utility, utility under attack, and targeted attack success rate. These metrics help to quantify the effectiveness of agents and defenses in handling prompt injection attacks. The framework also allows for the testing of different attack and defense strategies, providing insights into the challenges faced by both attackers and defenders. Overall, AgentDojo serves as a benchmarking environment for evaluating the robustness of AI agents against prompt injection attacks. It provides a dynamic and extensible framework that can be used to assess the performance of agents and defenses in a variety of scenarios. The framework is designed to foster research into new design principles for AI agents that can reliably and securely solve common tasks.
Reach us at info@study.space