26 Jun 2024 | Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong
APIGen is an automated data generation pipeline designed to create verifiable and diverse function-calling datasets for training function-calling agent models. The pipeline leverages a multi-stage verification process to ensure the quality and reliability of the generated data, including format checking, actual function executions, and semantic verification. The authors collected 3,673 executable APIs across 21 categories and used APIGen to generate diverse function-calling datasets. These datasets were then used to train two function-calling models: a 1.3B parameter model and a 6.7B parameter model. The 6.7B model achieved state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models, while the 1.3B model surpassed GPT-3.5-Turbo and Claude-3 Haiku. The authors released a dataset containing 60,000 high-quality entries, available on Huggingface and their project homepage. The paper also discusses the contributions of APIGen, related work, the APIGen framework, dataset preparation, experiments, and conclusions.APIGen is an automated data generation pipeline designed to create verifiable and diverse function-calling datasets for training function-calling agent models. The pipeline leverages a multi-stage verification process to ensure the quality and reliability of the generated data, including format checking, actual function executions, and semantic verification. The authors collected 3,673 executable APIs across 21 categories and used APIGen to generate diverse function-calling datasets. These datasets were then used to train two function-calling models: a 1.3B parameter model and a 6.7B parameter model. The 6.7B model achieved state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models, while the 1.3B model surpassed GPT-3.5-Turbo and Claude-3 Haiku. The authors released a dataset containing 60,000 high-quality entries, available on Huggingface and their project homepage. The paper also discusses the contributions of APIGen, related work, the APIGen framework, dataset preparation, experiments, and conclusions.