26 Mar 2024 | Longtiao Zheng, Zhiyuan Huang, Zhenghai Xue, Xinrun Wang, Bo An, Shuicheng Yan
AgentStudio is an open-source toolkit designed to facilitate the development of general virtual agents capable of interacting with any digital device. It addresses the challenges of building and evaluating virtual agents in real-world environments by providing a comprehensive platform that covers the entire lifecycle of agent development, including environment setup, data collection, agent evaluation, and visualization. The toolkit supports both function calling and human-computer interfaces, with a highly generic observation and action space that allows for a wide range of tasks. AgentStudio also includes graphical user interfaces that enable efficient development of datasets and benchmarks in real-world settings. The toolkit has been used to create a visual grounding dataset and a real-world benchmark suite, demonstrating its potential for advancing research in general virtual agents. The toolkit provides an integrated solution for environment setup, data collection, online testing, and result visualization. It supports a universal action space that includes both high-level function calls and low-level atomic operations, enabling agents to interact with arbitrary software. The observation space is multimodal, incorporating various modalities such as screen recordings, screenshots, and code execution results. AgentStudio also supports the creation of reusable code scripts as tools, allowing agents to develop skills by combining basic operations or creating tools to simplify decision-making. The environment is online, realistic, and compatible with diverse operating systems and devices, enabling the development of agents that can handle open-domain and real-world scenarios. AgentStudio includes a complete pipeline for data annotation and in-the-wild evaluation, along with interactive graphical user interfaces that facilitate the creation of open-domain benchmarks and datasets. The toolkit has been used to evaluate the performance of current multimodal models on a GUI grounding dataset, highlighting the need for further research in this area. The toolkit also provides a real-world benchmark suite that includes a variety of tasks, demonstrating the potential of AgentStudio for advancing research in general virtual agents. The toolkit has been used to develop several actionable insights, including the importance of general GUI grounding, learning from documents and video demonstrations, tool creation, selection, and use, and the development of a generalist critic model. Overall, AgentStudio provides a comprehensive platform for developing and evaluating general virtual agents, with the potential to significantly advance research in this area.AgentStudio is an open-source toolkit designed to facilitate the development of general virtual agents capable of interacting with any digital device. It addresses the challenges of building and evaluating virtual agents in real-world environments by providing a comprehensive platform that covers the entire lifecycle of agent development, including environment setup, data collection, agent evaluation, and visualization. The toolkit supports both function calling and human-computer interfaces, with a highly generic observation and action space that allows for a wide range of tasks. AgentStudio also includes graphical user interfaces that enable efficient development of datasets and benchmarks in real-world settings. The toolkit has been used to create a visual grounding dataset and a real-world benchmark suite, demonstrating its potential for advancing research in general virtual agents. The toolkit provides an integrated solution for environment setup, data collection, online testing, and result visualization. It supports a universal action space that includes both high-level function calls and low-level atomic operations, enabling agents to interact with arbitrary software. The observation space is multimodal, incorporating various modalities such as screen recordings, screenshots, and code execution results. AgentStudio also supports the creation of reusable code scripts as tools, allowing agents to develop skills by combining basic operations or creating tools to simplify decision-making. The environment is online, realistic, and compatible with diverse operating systems and devices, enabling the development of agents that can handle open-domain and real-world scenarios. AgentStudio includes a complete pipeline for data annotation and in-the-wild evaluation, along with interactive graphical user interfaces that facilitate the creation of open-domain benchmarks and datasets. The toolkit has been used to evaluate the performance of current multimodal models on a GUI grounding dataset, highlighting the need for further research in this area. The toolkit also provides a real-world benchmark suite that includes a variety of tasks, demonstrating the potential of AgentStudio for advancing research in general virtual agents. The toolkit has been used to develop several actionable insights, including the importance of general GUI grounding, learning from documents and video demonstrations, tool creation, selection, and use, and the development of a generalist critic model. Overall, AgentStudio provides a comprehensive platform for developing and evaluating general virtual agents, with the potential to significantly advance research in this area.