9 Feb 2023 | Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
Toolformer is a language model that learns to use external tools via simple API calls in a self-supervised manner. It is trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is achieved with minimal human annotations, requiring only a few demonstrations for each API. Toolformer incorporates a range of tools, including a calculator, a question answering system, a search engine, a translation system, and a calendar. It achieves improved zero-shot performance across various downstream tasks, often competitive with larger models, without sacrificing its core language modeling abilities.
The paper introduces Toolformer, a model that learns to use tools in a novel way, fulfilling the following desiderata: (1) the use of tools should be learned in a self-supervised way without requiring large amounts of human annotations, and (2) the LM should not lose any of its generality and should be able to decide for itself when and how to use which tool. The approach is based on using large LMs with in-context learning to generate datasets from scratch. Given a few human-written examples of how an API can be used, the model annotates a huge language modeling dataset with potential API calls. A self-supervised loss determines which API calls actually help the model in predicting future tokens. The LM is then fine-tuned on the API calls it considers useful.
Toolformer is evaluated on various downstream tasks, including LAMA, math datasets, question answering, multilingual question answering, and temporal datasets. It outperforms baselines, including larger models like GPT-3, on most tasks. It also maintains strong language modeling performance, as shown by perplexity metrics on language modeling datasets. The approach is agnostic to the dataset used, allowing it to be applied to the same dataset used for pretraining. Toolformer is shown to be effective across different model sizes, with performance improving as the model size increases. However, it has limitations, such as the inability to use tools in a chain and the sensitivity to input wording. The method is also sample-inefficient for some tools. Overall, Toolformer demonstrates the potential of self-supervised learning for enabling language models to use external tools effectively.Toolformer is a language model that learns to use external tools via simple API calls in a self-supervised manner. It is trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is achieved with minimal human annotations, requiring only a few demonstrations for each API. Toolformer incorporates a range of tools, including a calculator, a question answering system, a search engine, a translation system, and a calendar. It achieves improved zero-shot performance across various downstream tasks, often competitive with larger models, without sacrificing its core language modeling abilities.
The paper introduces Toolformer, a model that learns to use tools in a novel way, fulfilling the following desiderata: (1) the use of tools should be learned in a self-supervised way without requiring large amounts of human annotations, and (2) the LM should not lose any of its generality and should be able to decide for itself when and how to use which tool. The approach is based on using large LMs with in-context learning to generate datasets from scratch. Given a few human-written examples of how an API can be used, the model annotates a huge language modeling dataset with potential API calls. A self-supervised loss determines which API calls actually help the model in predicting future tokens. The LM is then fine-tuned on the API calls it considers useful.
Toolformer is evaluated on various downstream tasks, including LAMA, math datasets, question answering, multilingual question answering, and temporal datasets. It outperforms baselines, including larger models like GPT-3, on most tasks. It also maintains strong language modeling performance, as shown by perplexity metrics on language modeling datasets. The approach is agnostic to the dataset used, allowing it to be applied to the same dataset used for pretraining. Toolformer is shown to be effective across different model sizes, with performance improving as the model size increases. However, it has limitations, such as the inability to use tools in a chain and the sensitivity to input wording. The method is also sample-inefficient for some tools. Overall, Toolformer demonstrates the potential of self-supervised learning for enabling language models to use external tools effectively.