2024-4-24 | Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi, Charles Sutton and Pengcheng Yin
The paper introduces NExT, a method to teach large language models (LLMs) to reason about code execution by inspecting execution traces and generating chain-of-thought (CoT) rationales in natural language. NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales, improving the fix rate and rationale quality on program repair tasks. Experiments on the PaLM 2-L model show a 26.1% and 14.3% absolute improvement in fix rates on MBP-P-R and HUMAN-EVALFIX-PLUS datasets, respectively. NExT also generalizes well to scenarios without execution traces at test-time, demonstrating its robustness and effectiveness in reasoning about program execution.The paper introduces NExT, a method to teach large language models (LLMs) to reason about code execution by inspecting execution traces and generating chain-of-thought (CoT) rationales in natural language. NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales, improving the fix rate and rationale quality on program repair tasks. Experiments on the PaLM 2-L model show a 26.1% and 14.3% absolute improvement in fix rates on MBP-P-R and HUMAN-EVALFIX-PLUS datasets, respectively. NExT also generalizes well to scenarios without execution traces at test-time, demonstrating its robustness and effectiveness in reasoning about program execution.