6 Feb 2024 | Yu Du1*, Fangyun Wei2*,† Hongyang Zhang3
**AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls**
**Authors:** Yu Du, Fangyun Wei, Hongyang Zhang
**Institution:** Tsinghua University, Microsoft Research Asia, University of Waterloo
**Abstract:**
AnyTool is a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. It leverages over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs can potentially resolve the queries. AnyTool incorporates three key elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using selected API candidates, and a self-reflection mechanism that re-activates AnyTool if the initial solution proves impractical. AnyTool is powered by the function calling feature of GPT-4, eliminating the need for training external modules. The evaluation protocol is revisited, and an additional benchmark, AnyToolBench, is introduced to better reflect practical application scenarios. Experiments across various datasets demonstrate that AnyTool outperforms strong baselines such as ToolLLM and a GPT-4 variant tailored for tool utilization, achieving a +35.4% improvement in average pass rate on ToolBench.
**Introduction:**
The introduction highlights the evolution of tools and the role of large language models (LLMs) in enhancing their effectiveness. AnyTool is designed to effectively leverage over 16,000 APIs to address user queries, featuring a hierarchical API retriever, a solver, and a self-reflection mechanism. The hierarchical structure includes a meta-agent, category agents, and tool agents, each managing a collection of functions to explore the API space. The self-reflection mechanism allows AnyTool to review and analyze unsolved queries, improving the efficiency and effectiveness of the query resolution process.
**Evaluation:**
The evaluation framework revisits the ToolLLM protocol, identifying a limitation that leads to an artificially high pass rate. A revised protocol is proposed, and an additional benchmark, AnyToolBench, is introduced. Experiments show that AnyTool outperforms strong baselines, achieving state-of-the-art performance across various datasets.
**Conclusion:**
AnyTool is an advanced agent capable of harnessing 16,000+ APIs to effectively handle realistic user inquiries. The core of AnyTool is a hierarchical API retriever coupled with a solver, enhanced by a unique self-reflection mechanism. Rigorous experiments demonstrate its superiority over established models. Future research directions include optimizing API organization and developing more sophisticated self-reflection mechanisms.**AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls**
**Authors:** Yu Du, Fangyun Wei, Hongyang Zhang
**Institution:** Tsinghua University, Microsoft Research Asia, University of Waterloo
**Abstract:**
AnyTool is a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. It leverages over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs can potentially resolve the queries. AnyTool incorporates three key elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using selected API candidates, and a self-reflection mechanism that re-activates AnyTool if the initial solution proves impractical. AnyTool is powered by the function calling feature of GPT-4, eliminating the need for training external modules. The evaluation protocol is revisited, and an additional benchmark, AnyToolBench, is introduced to better reflect practical application scenarios. Experiments across various datasets demonstrate that AnyTool outperforms strong baselines such as ToolLLM and a GPT-4 variant tailored for tool utilization, achieving a +35.4% improvement in average pass rate on ToolBench.
**Introduction:**
The introduction highlights the evolution of tools and the role of large language models (LLMs) in enhancing their effectiveness. AnyTool is designed to effectively leverage over 16,000 APIs to address user queries, featuring a hierarchical API retriever, a solver, and a self-reflection mechanism. The hierarchical structure includes a meta-agent, category agents, and tool agents, each managing a collection of functions to explore the API space. The self-reflection mechanism allows AnyTool to review and analyze unsolved queries, improving the efficiency and effectiveness of the query resolution process.
**Evaluation:**
The evaluation framework revisits the ToolLLM protocol, identifying a limitation that leads to an artificially high pass rate. A revised protocol is proposed, and an additional benchmark, AnyToolBench, is introduced. Experiments show that AnyTool outperforms strong baselines, achieving state-of-the-art performance across various datasets.
**Conclusion:**
AnyTool is an advanced agent capable of harnessing 16,000+ APIs to effectively handle realistic user inquiries. The core of AnyTool is a hierarchical API retriever coupled with a solver, enhanced by a unique self-reflection mechanism. Rigorous experiments demonstrate its superiority over established models. Future research directions include optimizing API organization and developing more sophisticated self-reflection mechanisms.