WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents

WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents

8 Apr 2024 | Michael Lutz, Arth Bohra*, Manvel Saroyan, Artem Harutyunyan, Giovanni Campagna
WILBUR is a novel web agent that improves task success rates by learning from past experiences and adapting to new websites. It uses a differentiable ranking model and a novel instruction synthesis technique to optimally populate a large language model's prompt with task demonstrations. WILBUR also includes an intelligent backtracking mechanism that learns from mistakes. The agent can be trained on data from a generative auto-curriculum, which samples goals from an LLM and automatically evaluates the agent's performance. WILBUR achieves state-of-the-art results on the WebVoyager benchmark, outperforming text-only models by 8% and being within 5% of a strong multimodal model. The agent's success is attributed to its ability to explore, reflect, and backtrack, as well as its use of a demonstration ranking model and instruction synthesis. WILBUR's approach allows it to generalize across multiple websites and improve over time by learning from both successful and unsuccessful executions. The agent's performance is evaluated on the WebVoyager benchmark, where it achieves a new text-only state-of-the-art result of 53%. The results show that WILBUR can effectively navigate the web, even without access to visual information, and that its success is largely due to its ability to learn from past experiences and adapt to new websites. The agent's approach highlights the importance of learning the websites where it is applied and demonstrates the potential of adaptive in-context learning for web agents.WILBUR is a novel web agent that improves task success rates by learning from past experiences and adapting to new websites. It uses a differentiable ranking model and a novel instruction synthesis technique to optimally populate a large language model's prompt with task demonstrations. WILBUR also includes an intelligent backtracking mechanism that learns from mistakes. The agent can be trained on data from a generative auto-curriculum, which samples goals from an LLM and automatically evaluates the agent's performance. WILBUR achieves state-of-the-art results on the WebVoyager benchmark, outperforming text-only models by 8% and being within 5% of a strong multimodal model. The agent's success is attributed to its ability to explore, reflect, and backtrack, as well as its use of a demonstration ranking model and instruction synthesis. WILBUR's approach allows it to generalize across multiple websites and improve over time by learning from both successful and unsuccessful executions. The agent's performance is evaluated on the WebVoyager benchmark, where it achieves a new text-only state-of-the-art result of 53%. The results show that WILBUR can effectively navigate the web, even without access to visual information, and that its success is largely due to its ability to learn from past experiences and adapt to new websites. The agent's approach highlights the importance of learning the websites where it is applied and demonstrates the potential of adaptive in-context learning for web agents.
Reach us at info@study.space