TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

23 Jan 2024 | Zhiruo Wang, Graham Neubig, Daniel Fried
TROVE is a training-free method for inducing verifiable and efficient function toolboxes to solve programmatic tasks. It generates, grows, and periodically trims a toolbox of functions to produce simpler and more accurate solutions than baselines. TROVE outperforms existing methods on 11 datasets from math, table question answering, and image reasoning tasks, using 79-98% smaller toolboxes and achieving 31% faster and 13% more accurate human verification. TROVE creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics. It uses Python as the programming language and does not require additional training or supervision. TROVE features three major components: using and growing a toolbox over time, execution agreement-based selection, and periodic trimming of low-utility functions. It is compared to baselines using CODELLAMA and previous state-of-the-art methods using GPT, and shows higher accuracy and reduced complexity. TROVE also demonstrates improved human verification performance, with solutions generated by TROVE being 31% faster and 13% more accurate to verify than baseline methods. TROVE can induce specialized functions across tasks and datasets, shedding insights into data-specific characteristics. It is a training-free method that leverages execution agreement without any training or supervision.TROVE is a training-free method for inducing verifiable and efficient function toolboxes to solve programmatic tasks. It generates, grows, and periodically trims a toolbox of functions to produce simpler and more accurate solutions than baselines. TROVE outperforms existing methods on 11 datasets from math, table question answering, and image reasoning tasks, using 79-98% smaller toolboxes and achieving 31% faster and 13% more accurate human verification. TROVE creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics. It uses Python as the programming language and does not require additional training or supervision. TROVE features three major components: using and growing a toolbox over time, execution agreement-based selection, and periodic trimming of low-utility functions. It is compared to baselines using CODELLAMA and previous state-of-the-art methods using GPT, and shows higher accuracy and reduced complexity. TROVE also demonstrates improved human verification performance, with solutions generated by TROVE being 31% faster and 13% more accurate to verify than baseline methods. TROVE can induce specialized functions across tasks and datasets, shedding insights into data-specific characteristics. It is a training-free method that leverages execution agreement without any training or supervision.
Reach us at info@study.space
[slides] TroVE%3A Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks | StudySpace