MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

12 Jun 2024 | Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen
The paper introduces MobileAgentBench, an efficient and user-friendly benchmark designed to evaluate the performance of mobile LLM agents. MobileAgentBench addresses the challenges of extensive manual testing and the complexity of app states and feasible action sequences. It defines 100 tasks across 10 open-source apps, categorized by difficulty levels, and evaluates several existing mobile agents, including AppAgent and MobileAgent. The benchmark is accessible on a dedicated webpage and supports real Android devices, providing a robust and versatile testing environment. Key contributions include a fully autonomous and reliable evaluation process, a simplified extension mechanism, and an innovative method for determining task success based on the final UI state. The paper also discusses related work, experimental results, and future improvements, highlighting the potential of MobileAgentBench in advancing the field of mobile LLM agents.The paper introduces MobileAgentBench, an efficient and user-friendly benchmark designed to evaluate the performance of mobile LLM agents. MobileAgentBench addresses the challenges of extensive manual testing and the complexity of app states and feasible action sequences. It defines 100 tasks across 10 open-source apps, categorized by difficulty levels, and evaluates several existing mobile agents, including AppAgent and MobileAgent. The benchmark is accessible on a dedicated webpage and supports real Android devices, providing a robust and versatile testing environment. Key contributions include a fully autonomous and reliable evaluation process, a simplified extension mechanism, and an innovative method for determining task success based on the final UI state. The paper also discusses related work, experimental results, and future improvements, highlighting the potential of MobileAgentBench in advancing the field of mobile LLM agents.
Reach us at info@study.space
[slides] MobileAgentBench%3A An Efficient and User-Friendly Benchmark for Mobile LLM Agents | StudySpace