9 Feb 2023 | Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
**Toolformer: Language Models Can Teach Themselves to Use Tools**
This paper introduces Toolformer, a language model that learns to use external tools such as search engines, calculators, and translation systems through simple API calls. The model is trained in a self-supervised manner, requiring only a few demonstrations for each API. Toolformer decides which APIs to call, when to call them, what arguments to pass, and how to incorporate the results into future token prediction. The approach is based on in-context learning, where the model generates datasets from scratch by annotating a large language modeling dataset with potential API calls. The model is then finetuned on this augmented dataset, maintaining its core language modeling abilities while learning to use tools. Experiments show that Toolformer achieves improved zero-shot performance across various downstream tasks, often outperforming larger models like GPT-3 without losing its language modeling capabilities. The paper also discusses limitations and future directions, including the inability to use tools in a chain or interactively.**Toolformer: Language Models Can Teach Themselves to Use Tools**
This paper introduces Toolformer, a language model that learns to use external tools such as search engines, calculators, and translation systems through simple API calls. The model is trained in a self-supervised manner, requiring only a few demonstrations for each API. Toolformer decides which APIs to call, when to call them, what arguments to pass, and how to incorporate the results into future token prediction. The approach is based on in-context learning, where the model generates datasets from scratch by annotating a large language modeling dataset with potential API calls. The model is then finetuned on this augmented dataset, maintaining its core language modeling abilities while learning to use tools. Experiments show that Toolformer achieves improved zero-shot performance across various downstream tasks, often outperforming larger models like GPT-3 without losing its language modeling capabilities. The paper also discusses limitations and future directions, including the inability to use tools in a chain or interactively.