Understanding Offline Training of Language Model Agents with Functions as Learnable Weights

The paper introduces a novel paradigm for training Large Language Models (LLMs) as agents without modifying the LLM weights. This approach, inspired by human tool-forging, focuses on progressively updating the agent's functions to better solve downstream tasks. The authors propose AgentOptimizer, a method that leverages the LLM to update the agent's functions based on execution history and performance. The training process includes two strategies: roll-back and early-stop to prevent performance degradation. Extensive experiments on various tasks, including mathematical reasoning, tabular processing, and general real-world problems, demonstrate significant performance improvements for both GPT-4+ and ReAct agents. The method is also evaluated on its domain transferability and large-scale training data, showing its practical utility and generalization capabilities. The paper concludes by highlighting the potential societal impacts of this work, both positive and negative, and discusses related work in the field.The paper introduces a novel paradigm for training Large Language Models (LLMs) as agents without modifying the LLM weights. This approach, inspired by human tool-forging, focuses on progressively updating the agent's functions to better solve downstream tasks. The authors propose AgentOptimizer, a method that leverages the LLM to update the agent's functions based on execution history and performance. The training process includes two strategies: roll-back and early-stop to prevent performance degradation. Extensive experiments on various tasks, including mathematical reasoning, tabular processing, and general real-world problems, demonstrate significant performance improvements for both GPT-4+ and ReAct agents. The method is also evaluated on its domain transferability and large-scale training data, showing its practical utility and generalization capabilities. The paper concludes by highlighting the potential societal impacts of this work, both positive and negative, and discusses related work in the field.

Offline Training of Language Model Agents with Functions as Learnable Weights

2024 | Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu