18 Mar 2024 | Zora Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig
This survey explores the concept of tools from the perspective of language models (LMs), aiming to define and understand their role in enhancing LM capabilities. LMs are primarily designed for text generation but struggle with tasks requiring complex skills or access to external information. Tools, defined as external programs used by LMs, help overcome these limitations by enabling tasks such as mathematical calculations, real-time data retrieval, and interaction with the environment. The survey provides a unified definition of tools and categorizes their functions into perception, action, and computation. It reviews various tool usage scenarios, including knowledge access, computation activities, and interaction with the world, and discusses the efficiency of different tooling methods. The study also highlights challenges in tool evaluation and proposes metrics for assessing tool performance. Advanced tool-use methods, such as multi-tool selection, complex tooling in programmatic contexts, and tool creation, are analyzed. The survey emphasizes the importance of evaluating tool effectiveness, noting that while tools can significantly improve performance on certain tasks, they may not be necessary for others. The paper concludes that tools can greatly extend and facilitate LM abilities, and encourages further research into benchmarking, evaluation metrics, and realistic scenarios for tool usage.This survey explores the concept of tools from the perspective of language models (LMs), aiming to define and understand their role in enhancing LM capabilities. LMs are primarily designed for text generation but struggle with tasks requiring complex skills or access to external information. Tools, defined as external programs used by LMs, help overcome these limitations by enabling tasks such as mathematical calculations, real-time data retrieval, and interaction with the environment. The survey provides a unified definition of tools and categorizes their functions into perception, action, and computation. It reviews various tool usage scenarios, including knowledge access, computation activities, and interaction with the world, and discusses the efficiency of different tooling methods. The study also highlights challenges in tool evaluation and proposes metrics for assessing tool performance. Advanced tool-use methods, such as multi-tool selection, complex tooling in programmatic contexts, and tool creation, are analyzed. The survey emphasizes the importance of evaluating tool effectiveness, noting that while tools can significantly improve performance on certain tasks, they may not be necessary for others. The paper concludes that tools can greatly extend and facilitate LM abilities, and encourages further research into benchmarking, evaluation metrics, and realistic scenarios for tool usage.