Learning to Use Tools via Cooperative and Interactive Agents with Large Language Models

Learning to Use Tools via Cooperative and Interactive Agents with Large Language Models

22 Jun 2024 | Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren
This paper proposes ConAgents, a cooperative and interactive agents framework for tool learning tasks. ConAgents coordinates three specialized agents: grounding, execution, and review agents, which work together to solve complex tasks. The grounding agent breaks down a task into sub-tasks and generates tool-use planning. The execution agent follows the planning to execute the selected tool by generating executable code. The review agent reviews the incorrectness in planning or execution, providing feedback for revision. To enable dynamic cooperation of these agents, two communication protocols are introduced: automatic and adaptive interaction. The automatic interaction allows the review agent to provide real-time reviews to calibrate incorrect actions. The adaptive interaction only provides feedback when exceptional errors are captured while executing the tools. To effectively generalize ConAgents into open-source models, we propose specialized action distillation (SPAN), which enhances their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate). The contributions of this work include: (1) proposing ConAgents, a cooperative and interactive agents framework for tool learning tasks; (2) proposing specialized action distillation (SPAN), which more effectively enables open-source models to work with ConAgents; (3) both automatic and human evaluation conducted on two benchmarks validate the superiority of ConAgents. The paper also discusses related work, including LLMs for tool learning and multi-agent cooperation. It presents the methodology, including the overall framework, specialized agents, and agent communication protocols. The experiments are conducted on two benchmarks, RestBench and ToolBench, and the results show that ConAgents outperforms baselines in terms of success rate and correct path rate. The paper also discusses the limitations of the proposed framework and the ethical considerations of using large language models.This paper proposes ConAgents, a cooperative and interactive agents framework for tool learning tasks. ConAgents coordinates three specialized agents: grounding, execution, and review agents, which work together to solve complex tasks. The grounding agent breaks down a task into sub-tasks and generates tool-use planning. The execution agent follows the planning to execute the selected tool by generating executable code. The review agent reviews the incorrectness in planning or execution, providing feedback for revision. To enable dynamic cooperation of these agents, two communication protocols are introduced: automatic and adaptive interaction. The automatic interaction allows the review agent to provide real-time reviews to calibrate incorrect actions. The adaptive interaction only provides feedback when exceptional errors are captured while executing the tools. To effectively generalize ConAgents into open-source models, we propose specialized action distillation (SPAN), which enhances their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate). The contributions of this work include: (1) proposing ConAgents, a cooperative and interactive agents framework for tool learning tasks; (2) proposing specialized action distillation (SPAN), which more effectively enables open-source models to work with ConAgents; (3) both automatic and human evaluation conducted on two benchmarks validate the superiority of ConAgents. The paper also discusses related work, including LLMs for tool learning and multi-agent cooperation. It presents the methodology, including the overall framework, specialized agents, and agent communication protocols. The experiments are conducted on two benchmarks, RestBench and ToolBench, and the results show that ConAgents outperforms baselines in terms of success rate and correct path rate. The paper also discusses the limitations of the proposed framework and the ethical considerations of using large language models.
Reach us at info@study.space
Understanding Learning to Use Tools via Cooperative and Interactive Agents