Automated Program Repair, What Is It Good For? Not Absolutely Nothing!

Automated Program Repair, What Is It Good For? Not Absolutely Nothing!

April 14-20, 2024 | Hadeel Eladawy, Claire Le Goues, Yuriy Brun
Automated Program Repair (APR) is a technology that automatically generates patches for software defects. This paper investigates how APR affects developers during debugging. A controlled user study with 40 developers, each performing four debugging tasks on nine real-world defects, was conducted. The developers used three state-of-the-art APR tools: Recoder, SimFix, and TBar. They were given access to either no suggestions, correct developer-written suggestions, correct APR-generated suggestions, or deceptive APR-generated suggestions that passed all tests but failed to fully repair the defect. The study found that access to code suggestions significantly increased the odds of submitting a patch. Access to correct APR suggestions increased debugging success by 14,000% compared to having access only to tests, while access to deceptive suggestions decreased success by 65%. Correct suggestions also sped up debugging. Surprisingly, there was no significant difference in how novice and experienced developers were affected by APR, suggesting that APR may be useful across the experience spectrum. Developers had a strong positive impression of APR, indicating its potential for human-driven debugging. However, APR-generated patches are not always correct, and deceptive suggestions can lead to overfitting, where patches pass all tests but fail to generalize to the intended behavior. The study also found that participants were often able to recognize deceptive suggestions as lower quality, but 67.5% still submitted overfitting patches. No quantitative evidence was found that more experienced developers were better at distinguishing between high- and low-quality patches or benefiting more from APR. Overall, the study shows that APR can be beneficial for developers with a range of experience, despite the challenges in the quality of APR-generated patches. The results suggest that APR may find uses across the experience spectrum, and that developers may benefit from using APR in their debugging processes.Automated Program Repair (APR) is a technology that automatically generates patches for software defects. This paper investigates how APR affects developers during debugging. A controlled user study with 40 developers, each performing four debugging tasks on nine real-world defects, was conducted. The developers used three state-of-the-art APR tools: Recoder, SimFix, and TBar. They were given access to either no suggestions, correct developer-written suggestions, correct APR-generated suggestions, or deceptive APR-generated suggestions that passed all tests but failed to fully repair the defect. The study found that access to code suggestions significantly increased the odds of submitting a patch. Access to correct APR suggestions increased debugging success by 14,000% compared to having access only to tests, while access to deceptive suggestions decreased success by 65%. Correct suggestions also sped up debugging. Surprisingly, there was no significant difference in how novice and experienced developers were affected by APR, suggesting that APR may be useful across the experience spectrum. Developers had a strong positive impression of APR, indicating its potential for human-driven debugging. However, APR-generated patches are not always correct, and deceptive suggestions can lead to overfitting, where patches pass all tests but fail to generalize to the intended behavior. The study also found that participants were often able to recognize deceptive suggestions as lower quality, but 67.5% still submitted overfitting patches. No quantitative evidence was found that more experienced developers were better at distinguishing between high- and low-quality patches or benefiting more from APR. Overall, the study shows that APR can be beneficial for developers with a range of experience, despite the challenges in the quality of APR-generated patches. The results suggest that APR may find uses across the experience spectrum, and that developers may benefit from using APR in their debugging processes.
Reach us at info@study.space