CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

16-20 September, 2024, Vienna, Austria | Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawende F. Bissyandé, Shunfu Jin
The paper introduces CREF, a conversational software repair framework for programming tutors, leveraging Large Language Models (LLMs) to enhance the repair capabilities of LLMs in programming education. The authors address the limitations of existing LLM-based repair techniques, such as data leakage and high computational overhead, by proposing a novel benchmark dataset called *TutorCODE*, which includes 1,239 C++ defect codes and associated information like tutor guidance, solution descriptions, failing test cases, and corrected code. They evaluate the performance of 12 LLMs on *TutorCODE*, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). The study finds that GPT-4 and GPT-3.5 consistently outperform other LLMs in program repair tasks. Additionally, the research shows that tutor guidance is the most effective type of augmented information in enhancing LLMs' repair capabilities. To further improve LLMs' conversational capabilities and the benefits of augmented information, the authors introduce CREF, a semi-automatic conversational repair framework. CREF demonstrates a significant improvement in AVG-5 (17.2%-24.6%) compared to the baseline, achieving an impressive AVG-5 of 76.6% with GPT-4. In a real-world educational setting, CREF reduces tutors' workload by 71.2% and decreases costs by 69.9%, enhancing the learning experience for students. The paper also discusses the contributions, methodology, and experimental results, highlighting the effectiveness of CREF in practical applications.The paper introduces CREF, a conversational software repair framework for programming tutors, leveraging Large Language Models (LLMs) to enhance the repair capabilities of LLMs in programming education. The authors address the limitations of existing LLM-based repair techniques, such as data leakage and high computational overhead, by proposing a novel benchmark dataset called *TutorCODE*, which includes 1,239 C++ defect codes and associated information like tutor guidance, solution descriptions, failing test cases, and corrected code. They evaluate the performance of 12 LLMs on *TutorCODE*, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). The study finds that GPT-4 and GPT-3.5 consistently outperform other LLMs in program repair tasks. Additionally, the research shows that tutor guidance is the most effective type of augmented information in enhancing LLMs' repair capabilities. To further improve LLMs' conversational capabilities and the benefits of augmented information, the authors introduce CREF, a semi-automatic conversational repair framework. CREF demonstrates a significant improvement in AVG-5 (17.2%-24.6%) compared to the baseline, achieving an impressive AVG-5 of 76.6% with GPT-4. In a real-world educational setting, CREF reduces tutors' workload by 71.2% and decreases costs by 69.9%, enhancing the learning experience for students. The paper also discusses the contributions, methodology, and experimental results, highlighting the effectiveness of CREF in practical applications.
Reach us at info@study.space