Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

30 May 2024 | Hao Tang, Keya Hu, Jin Peng Zhou, Sicheng Zhong, Wei-Long Zheng, Xujie Si, Kevin Ellis
This paper presents REx, a novel algorithm for iteratively refining code using large language models (LLMs), which addresses the exploration-exploitation tradeoff in program synthesis. The key idea is to frame the refinement process as a multi-armed bandit problem, where each program is an "arm" and the goal is to balance between refining programs that are likely to be correct (exploitation) and exploring less tried programs (exploration). REx uses Thompson Sampling to make decisions, maintaining probabilistic beliefs about the quality of each program and updating them based on feedback from LLM refinements. The algorithm is designed to efficiently navigate the infinite tree of possible refinements, where each node represents a program and each edge represents an LLM refinement. REx prioritizes programs with high heuristic values, which indicate their potential to satisfy the specification, while also considering how many times a program has been refined. This approach allows REx to adaptively choose between exploring new programs or refining existing ones, leading to more efficient code generation. Experiments across three domains—competition programming, visual reasoning, and software verification—show that REx outperforms existing methods in terms of the number of problems solved and the efficiency of LLM calls. REx is particularly effective in solving hard problems that other methods struggle with, and it achieves significant speedups in compute usage. The algorithm is also robust to variations in hyperparameters, making it more reliable across different datasets and problem types. The paper also discusses the limitations of REx, including its dependence on the quality of the base LLM and the potential for less capable models to benefit less from the algorithm. The work contributes to the field of program synthesis by introducing a new approach that leverages bandit-based strategies to balance exploration and exploitation in the refinement process.This paper presents REx, a novel algorithm for iteratively refining code using large language models (LLMs), which addresses the exploration-exploitation tradeoff in program synthesis. The key idea is to frame the refinement process as a multi-armed bandit problem, where each program is an "arm" and the goal is to balance between refining programs that are likely to be correct (exploitation) and exploring less tried programs (exploration). REx uses Thompson Sampling to make decisions, maintaining probabilistic beliefs about the quality of each program and updating them based on feedback from LLM refinements. The algorithm is designed to efficiently navigate the infinite tree of possible refinements, where each node represents a program and each edge represents an LLM refinement. REx prioritizes programs with high heuristic values, which indicate their potential to satisfy the specification, while also considering how many times a program has been refined. This approach allows REx to adaptively choose between exploring new programs or refining existing ones, leading to more efficient code generation. Experiments across three domains—competition programming, visual reasoning, and software verification—show that REx outperforms existing methods in terms of the number of problems solved and the efficiency of LLM calls. REx is particularly effective in solving hard problems that other methods struggle with, and it achieves significant speedups in compute usage. The algorithm is also robust to variations in hyperparameters, making it more reliable across different datasets and problem types. The paper also discusses the limitations of REx, including its dependence on the quality of the base LLM and the potential for less capable models to benefit less from the algorithm. The work contributes to the field of program synthesis by introducing a new approach that leverages bandit-based strategies to balance exploration and exploitation in the refinement process.
Reach us at info@study.space