NEt: Teaching Large Language Models to Reason about Code Execution

NEt: Teaching Large Language Models to Reason about Code Execution

2024-04-23 | Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi, Charles Sutton, and Pengcheng Yin
NExT is a method to teach large language models (LLMs) to reason about program execution by inspecting execution traces and generating chain-of-thought (CoT) rationales. The approach uses self-training to create synthetic training data with high-quality rationales that lead to correct program fixes. This method improves the ability of LLMs to understand and reason about program execution, leading to better performance in program repair tasks. Experiments on MBPP and HUMANEval show that NExT improves the fix rate of PaLM 2 by 26.1% and 14.3%, respectively. NExT also generalizes well to scenarios where execution traces are not available at test time. The model generates high-quality rationales that explain bugs and suggest fixes, which are validated through both proxy-based evaluation and human ratings. NExT represents execution traces as inline code comments, making them more understandable for LLMs. The model is trained using weakly-supervised self-training, which allows it to learn from a subset of challenging problems and improve its reasoning skills. NExT outperforms other models in program repair tasks and demonstrates strong generalization capabilities. The approach is effective in improving the quality of rationales and the success rate of program repair, making it a valuable tool for developers.NExT is a method to teach large language models (LLMs) to reason about program execution by inspecting execution traces and generating chain-of-thought (CoT) rationales. The approach uses self-training to create synthetic training data with high-quality rationales that lead to correct program fixes. This method improves the ability of LLMs to understand and reason about program execution, leading to better performance in program repair tasks. Experiments on MBPP and HUMANEval show that NExT improves the fix rate of PaLM 2 by 26.1% and 14.3%, respectively. NExT also generalizes well to scenarios where execution traces are not available at test time. The model generates high-quality rationales that explain bugs and suggest fixes, which are validated through both proxy-based evaluation and human ratings. NExT represents execution traces as inline code comments, making them more understandable for LLMs. The model is trained using weakly-supervised self-training, which allows it to learn from a subset of challenging problems and improve its reasoning skills. NExT outperforms other models in program repair tasks and demonstrates strong generalization capabilities. The approach is effective in improving the quality of rationales and the success rate of program repair, making it a valuable tool for developers.
Reach us at info@study.space
[slides] NExT%3A Teaching Large Language Models to Reason about Code Execution | StudySpace