AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments

AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments

6 Jun 2024 | Zhiheng Xi*, Yiwen Ding*, Wenxiang Chen*, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui†, Qi Zhang†, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
The paper introduces AGENTGYM, a framework designed to evolve large language model (LLM)-based agents across diverse environments and tasks. The framework aims to address the limitations of current approaches, which either require human supervision for step-by-step imitation or result in specialist agents with limited generalization. AGENTGYM features a variety of environments and tasks, an expanded trajectory set, and an effective evolution method. The authors propose AGENTEVAL, a benchmark suite to evaluate the potential of agent self-evolution. Experimental results show that the evolved agents can achieve comparable or better performance than state-of-the-art (SOTA) models. The framework includes a platform with 14 environments and 89 tasks, an expanded instruction set, and high-quality trajectories. The AGENTEVOL algorithm is introduced to explore self-evolution, demonstrating its effectiveness in evolving agents across multiple environments and tasks. The paper also discusses the limitations and future directions, emphasizing the importance of safety and ethical considerations in the development of self-evolving agents.The paper introduces AGENTGYM, a framework designed to evolve large language model (LLM)-based agents across diverse environments and tasks. The framework aims to address the limitations of current approaches, which either require human supervision for step-by-step imitation or result in specialist agents with limited generalization. AGENTGYM features a variety of environments and tasks, an expanded trajectory set, and an effective evolution method. The authors propose AGENTEVAL, a benchmark suite to evaluate the potential of agent self-evolution. Experimental results show that the evolved agents can achieve comparable or better performance than state-of-the-art (SOTA) models. The framework includes a platform with 14 environments and 89 tasks, an expanded instruction set, and high-quality trajectories. The AGENTEVOL algorithm is introduced to explore self-evolution, demonstrating its effectiveness in evolving agents across multiple environments and tasks. The paper also discusses the limitations and future directions, emphasizing the importance of safety and ethical considerations in the development of self-evolving agents.
Reach us at info@study.space