AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments

AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments

6 Jun 2024 | Zhiheng Xi*, Yiwen Ding*, Wenxiang Chen*, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui†, Qi Zhang†, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
AGENTGYM is a new framework designed to help the AI community develop generally-capable large language model (LLM)-based agents that can evolve themselves across diverse environments. The framework includes an interactive platform with various environments and tasks, a benchmark suite (AGENTEval), and two trajectory sets (AGENTTRAJ and AGENTTRAJ-L). The paper introduces AGENTEVOL, a novel method for exploring self-evolution in LLM-based agents across multiple environments and tasks. The framework enables agents to learn from diverse environments and tasks, and to evolve themselves through exploration and learning. AGENTGYM provides a variety of environments and tasks for real-time, concurrent agent exploration, along with a database of expanded instructions, a benchmark suite, and high-quality trajectories across environments. The AGENTEVOL method allows agents to evolve themselves beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to state-of-the-art (SOTA) models. The AGENTGYM suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations, is released for the community to use. The framework supports real-time feedback and concurrency, and is easily scalable. The paper also discusses the three key pillars necessary for this research goal: diverse environments and tasks, a trajectory set for training a base agent, and an effective and scalable evolution method. The framework includes 14 environments spanning diverse categories, 89 tasks, and a benchmark suite for evaluating LLM-based agents. The AGENTEVOL method is proposed to explore agent evolution across multiple environments and tasks, and the paper presents experimental results showing that the evolved agents can achieve results comparable to SOTA models. The paper also discusses the challenges of training LLM-based agents in isolated environments and the importance of self-evolution for generalization. The framework provides a comprehensive solution for developing generally-capable LLM-based agents that can evolve themselves across diverse environments.AGENTGYM is a new framework designed to help the AI community develop generally-capable large language model (LLM)-based agents that can evolve themselves across diverse environments. The framework includes an interactive platform with various environments and tasks, a benchmark suite (AGENTEval), and two trajectory sets (AGENTTRAJ and AGENTTRAJ-L). The paper introduces AGENTEVOL, a novel method for exploring self-evolution in LLM-based agents across multiple environments and tasks. The framework enables agents to learn from diverse environments and tasks, and to evolve themselves through exploration and learning. AGENTGYM provides a variety of environments and tasks for real-time, concurrent agent exploration, along with a database of expanded instructions, a benchmark suite, and high-quality trajectories across environments. The AGENTEVOL method allows agents to evolve themselves beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to state-of-the-art (SOTA) models. The AGENTGYM suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations, is released for the community to use. The framework supports real-time feedback and concurrency, and is easily scalable. The paper also discusses the three key pillars necessary for this research goal: diverse environments and tasks, a trajectory set for training a base agent, and an effective and scalable evolution method. The framework includes 14 environments spanning diverse categories, 89 tasks, and a benchmark suite for evaluating LLM-based agents. The AGENTEVOL method is proposed to explore agent evolution across multiple environments and tasks, and the paper presents experimental results showing that the evolved agents can achieve results comparable to SOTA models. The paper also discusses the challenges of training LLM-based agents in isolated environments and the importance of self-evolution for generalization. The framework provides a comprehensive solution for developing generally-capable LLM-based agents that can evolve themselves across diverse environments.
Reach us at info@study.space