CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

16-20 September, 2024 | Boyang Yang, Haoye Tian, Weigu Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin
This paper introduces a novel conversational semi-automatic repair framework, CREF, for programming tutors, which leverages large language models (LLMs) and augmented information to enhance program repair capabilities. The study evaluates the effectiveness of LLMs in repairing code using a newly developed benchmark called TutorCode, which contains 1,239 incorrect C++ code samples and associated information such as tutor guidance, solution descriptions, failing test cases, and corrected code. The study assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). The results show that GPT-4 and GPT-3.5 consistently outperform other LLMs in program repair tasks. The study also investigates the impact of different types of augmented information on LLM-based program repair performance, finding that tutor guidance is the most effective in enhancing LLM repair capabilities. To fully harness LLMs' conversational capabilities and the benefits of augmented information, the study introduces CREF, a conversational semi-automatic repair framework that assists human programming tutors. CREF demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs' repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors' workload and improving students' learning experience, while also showcasing its promise for facilitating other software engineering tasks, such as code review. The study also evaluates the performance of CREF on a large-scale dataset and finds that it significantly reduces the time and cost of debugging for students. The study concludes that CREF is a promising approach for improving program repair capabilities in programming education scenarios.This paper introduces a novel conversational semi-automatic repair framework, CREF, for programming tutors, which leverages large language models (LLMs) and augmented information to enhance program repair capabilities. The study evaluates the effectiveness of LLMs in repairing code using a newly developed benchmark called TutorCode, which contains 1,239 incorrect C++ code samples and associated information such as tutor guidance, solution descriptions, failing test cases, and corrected code. The study assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). The results show that GPT-4 and GPT-3.5 consistently outperform other LLMs in program repair tasks. The study also investigates the impact of different types of augmented information on LLM-based program repair performance, finding that tutor guidance is the most effective in enhancing LLM repair capabilities. To fully harness LLMs' conversational capabilities and the benefits of augmented information, the study introduces CREF, a conversational semi-automatic repair framework that assists human programming tutors. CREF demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs' repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors' workload and improving students' learning experience, while also showcasing its promise for facilitating other software engineering tasks, such as code review. The study also evaluates the performance of CREF on a large-scale dataset and finds that it significantly reduces the time and cost of debugging for students. The study concludes that CREF is a promising approach for improving program repair capabilities in programming education scenarios.
Reach us at info@study.space
Understanding CREF%3A An LLM-based Conversational Software Repair Framework for Programming Tutors