30 May 2024 | Hao Tang, Keya Hu, Jin Peng Zhou, Sicheng Zhong, Wei-Long Zheng, Xujie Si, Kevin Ellis
The paper introduces a novel approach to code refinement using large language models (LLMs), focusing on the exploration-exploitation tradeoff in iterative code improvement. The authors propose a method called *REx* (Refine, Explore, Exploit), which frames the refinement process as an arm-acquiring bandit problem and solves it using Thompson Sampling. This approach balances the exploration of less explored programs with the exploitation of programs that have already shown promise in passing test cases. The method is evaluated across three domains: competition programming, visual reasoning puzzles, and loop invariant synthesis for software verification. *REx* demonstrates superior performance in solving more problems with fewer LLM calls compared to other methods, showing robustness across different datasets and hyperparameter settings. The paper also discusses the limitations of the approach and suggests directions for future research, including the potential benefits of using other LLMs and the need for further improvements in solving the hardest problems.The paper introduces a novel approach to code refinement using large language models (LLMs), focusing on the exploration-exploitation tradeoff in iterative code improvement. The authors propose a method called *REx* (Refine, Explore, Exploit), which frames the refinement process as an arm-acquiring bandit problem and solves it using Thompson Sampling. This approach balances the exploration of less explored programs with the exploitation of programs that have already shown promise in passing test cases. The method is evaluated across three domains: competition programming, visual reasoning puzzles, and loop invariant synthesis for software verification. *REx* demonstrates superior performance in solving more problems with fewer LLM calls compared to other methods, showing robustness across different datasets and hyperparameter settings. The paper also discusses the limitations of the approach and suggests directions for future research, including the potential benefits of using other LLMs and the need for further improvements in solving the hardest problems.