Understanding Learning to Use Tools via Cooperative and Interactive Agents

The paper introduces ConAgents, a cooperative and interactive agents framework designed to enhance large language models (LLMs) in using external tools and extending their utility. ConAgents decomposes the tool-use workflow into three specialized agents: Grounding, Execution, and Review, each responsible for planning, executing, and reviewing actions, respectively. The framework introduces two communication protocols—automatic and adaptive interaction—to enable dynamic cooperation among these agents, allowing for flexible calibration of incorrect actions. To improve the performance of open-source models, the authors propose Specialized Action Distillation (SPAN), which distills the task-solving trajectory of powerful commercial LLMs into specialized actions for open-source models. Extensive experiments on three datasets (ToolBench, RestBench, and Spotify) demonstrate that ConAgents outperforms existing methods, achieving up to 14% higher success rates. The framework's effectiveness is further validated through human evaluation and ablation studies, showing that the proposed methods effectively handle complex tasks and improve the efficiency and reliability of LLMs in tool usage.The paper introduces ConAgents, a cooperative and interactive agents framework designed to enhance large language models (LLMs) in using external tools and extending their utility. ConAgents decomposes the tool-use workflow into three specialized agents: Grounding, Execution, and Review, each responsible for planning, executing, and reviewing actions, respectively. The framework introduces two communication protocols—automatic and adaptive interaction—to enable dynamic cooperation among these agents, allowing for flexible calibration of incorrect actions. To improve the performance of open-source models, the authors propose Specialized Action Distillation (SPAN), which distills the task-solving trajectory of powerful commercial LLMs into specialized actions for open-source models. Extensive experiments on three datasets (ToolBench, RestBench, and Spotify) demonstrate that ConAgents outperforms existing methods, achieving up to 14% higher success rates. The framework's effectiveness is further validated through human evaluation and ablation studies, showing that the proposed methods effectively handle complex tasks and improve the efficiency and reliability of LLMs in tool usage.

Learning to Use Tools via Cooperative and Interactive Agents with Large Language Models

22 Jun 2024 | Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren