[slides and audio] Agent-E%3A From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

Agent-E is a novel web agent that improves upon existing state-of-the-art web agents through architectural innovations such as hierarchical architecture, flexible DOM distillation, and change observation. It outperforms previous text-only and multi-modal web agents on the WebVoyager benchmark, achieving a 73.2% success rate, a 20% improvement over text-only agents and a 16% improvement over multi-modal agents. Agent-E also introduces design principles for agentic systems, including the use of domain-specific skills, DOM distillation and denoising, hierarchical architecture, and self-improvement. The agent is evaluated in autonomous mode on the WebVoyager benchmark, achieving a new state-of-the-art result. Agent-E supports multiple DOM observation methods and provides linguistic feedback to improve agent performance. It also includes a change observation mechanism that helps the agent understand the consequences of its actions. The agent is designed to be efficient, with a low number of LLM calls per task. Agent-E is also capable of handling complex tasks, such as finding the price of a subscription and the minimum number of users required for it. The agent's hierarchical architecture allows for error detection and recovery, and it supports human-in-the-loop workflows. The agent is evaluated on a variety of tasks, including searching for hotels, booking flights, and interacting with websites. The agent's performance is measured in terms of task success rates, task completion times, and the number of LLM calls. Agent-E's success rate is higher than previous agents, and it is more efficient in terms of LLM calls. The agent's design principles are applicable beyond web automation and can be used to develop more effective agentic systems.Agent-E is a novel web agent that improves upon existing state-of-the-art web agents through architectural innovations such as hierarchical architecture, flexible DOM distillation, and change observation. It outperforms previous text-only and multi-modal web agents on the WebVoyager benchmark, achieving a 73.2% success rate, a 20% improvement over text-only agents and a 16% improvement over multi-modal agents. Agent-E also introduces design principles for agentic systems, including the use of domain-specific skills, DOM distillation and denoising, hierarchical architecture, and self-improvement. The agent is evaluated in autonomous mode on the WebVoyager benchmark, achieving a new state-of-the-art result. Agent-E supports multiple DOM observation methods and provides linguistic feedback to improve agent performance. It also includes a change observation mechanism that helps the agent understand the consequences of its actions. The agent is designed to be efficient, with a low number of LLM calls per task. Agent-E is also capable of handling complex tasks, such as finding the price of a subscription and the minimum number of users required for it. The agent's hierarchical architecture allows for error detection and recovery, and it supports human-in-the-loop workflows. The agent is evaluated on a variety of tasks, including searching for hotels, booking flights, and interacting with websites. The agent's performance is measured in terms of task success rates, task completion times, and the number of LLM calls. Agent-E's success rate is higher than previous agents, and it is more efficient in terms of LLM calls. The agent's design principles are applicable beyond web automation and can be used to develop more effective agentic systems.

Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

July, 2024 | Tamer Abuelsaad, Deepak Akkil, Prasenjit Dey, Ashish Jagmohan, Aditya Vempaty, Ravi Kokku