7 Mar 2024 | Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, Qi Guo
**ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs**
**Authors:** Jiaolong Kong
**Abstract:**
Automated Program Repair (APR) aims to automatically generate patches for fixing software bugs. Recent advancements in Large Language Models (LLMs), such as ChatGPT, have shown promising results in APR, particularly in conversation-driven APR frameworks. However, the effectiveness of these frameworks depends on the quality of feedback information. This paper introduces *ContrastRepair*, a novel conversation-based APR approach that enhances the quality of feedback by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, offering contrasting feedback to the LLM. The key insight is to minimize the difference between the generated passing test and the given failing test, which helps in isolating the root causes of bugs. By providing informative and specific feedback, *ContrastRepair* enables LLMs to produce effective bug fixes. The implementation of *ContrastRepair* is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. Evaluations on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-java, demonstrate that *ContrastRepair* significantly outperforms existing methods, achieving a new state-of-the-art in program repair.
**Introduction:**
The increasing complexity of software has led to a rise in bugs and vulnerabilities, which can cause system failures, security breaches, and compromised user experiences. Manual debugging is time-consuming and labor-intensive, making APR a promising solution. Traditional APR techniques, such as template-based, heuristic-based, and constraint-based methods, have limitations in terms of generalization and efficiency. Machine learning techniques, particularly deep learning-based APR, have shown promise but still face challenges, such as the reliance on training data and the need for comprehensive datasets. The use of LLMs, trained on large datasets, has emerged as a promising approach, offering superior performance in APR tasks. *ContrastRepair* leverages LLMs in a conversation-driven manner, integrating both negative and positive feedback to enhance the quality of fixes.
**Methodology:**
*ContrastRepair* involves a conversation process that constructs prompts using contrastive test pairs, feeds these prompts to an LLM like ChatGPT, and receives responses leading to the repaired code. The process includes evaluating the program with a test suite, selecting suitable passing test cases to pair with the failing test, and constructing prompts that include the buggy function, test pairs, traceback information, and dependent functions. The LLM then generates the repaired code, which is validated in subsequent iterations. The repair process continues until plausible patches are identified or the repair budget is exhausted.
**Evaluation:**
*ContrastRepair* is evaluated on three benchmark datasets: Defects4j, QuixBugs, and HumanEval-Java**ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs**
**Authors:** Jiaolong Kong
**Abstract:**
Automated Program Repair (APR) aims to automatically generate patches for fixing software bugs. Recent advancements in Large Language Models (LLMs), such as ChatGPT, have shown promising results in APR, particularly in conversation-driven APR frameworks. However, the effectiveness of these frameworks depends on the quality of feedback information. This paper introduces *ContrastRepair*, a novel conversation-based APR approach that enhances the quality of feedback by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, offering contrasting feedback to the LLM. The key insight is to minimize the difference between the generated passing test and the given failing test, which helps in isolating the root causes of bugs. By providing informative and specific feedback, *ContrastRepair* enables LLMs to produce effective bug fixes. The implementation of *ContrastRepair* is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. Evaluations on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-java, demonstrate that *ContrastRepair* significantly outperforms existing methods, achieving a new state-of-the-art in program repair.
**Introduction:**
The increasing complexity of software has led to a rise in bugs and vulnerabilities, which can cause system failures, security breaches, and compromised user experiences. Manual debugging is time-consuming and labor-intensive, making APR a promising solution. Traditional APR techniques, such as template-based, heuristic-based, and constraint-based methods, have limitations in terms of generalization and efficiency. Machine learning techniques, particularly deep learning-based APR, have shown promise but still face challenges, such as the reliance on training data and the need for comprehensive datasets. The use of LLMs, trained on large datasets, has emerged as a promising approach, offering superior performance in APR tasks. *ContrastRepair* leverages LLMs in a conversation-driven manner, integrating both negative and positive feedback to enhance the quality of fixes.
**Methodology:**
*ContrastRepair* involves a conversation process that constructs prompts using contrastive test pairs, feeds these prompts to an LLM like ChatGPT, and receives responses leading to the repaired code. The process includes evaluating the program with a test suite, selecting suitable passing test cases to pair with the failing test, and constructing prompts that include the buggy function, test pairs, traceback information, and dependent functions. The LLM then generates the repaired code, which is validated in subsequent iterations. The repair process continues until plausible patches are identified or the repair budget is exhausted.
**Evaluation:**
*ContrastRepair* is evaluated on three benchmark datasets: Defects4j, QuixBugs, and HumanEval-Java