Understanding Improving Automated Code Reviews%3A Learning from Experience

This paper investigates the effectiveness of experience-aware oversampling in improving the quality of automated code reviews. Modern code review is a critical process for maintaining software quality, but it can be demanding and time-consuming for reviewers. To address this, automated code review models have been developed to mimic human reviewers. However, these models often treat all review examples equally, regardless of the reviewer's experience. The authors propose an approach that focuses on training models using more examples from experienced reviewers, who are more likely to provide deeper insights and more valuable feedback. They use a technique called experience-aware oversampling, where the model is trained with a higher proportion of examples from experienced reviewers, thereby increasing their influence on the model's behavior. The study evaluates the effectiveness of this approach through quantitative and qualitative methods. The results show that the experience-aware oversampling models generate more semantically correct comments, provide more applicable suggestions, and include more explanations compared to the original model. Additionally, the models are better at identifying critical issues such as logic, validation, and resource management. The study concludes that experience-aware oversampling can significantly enhance the quality of automated code reviews without requiring additional data, highlighting the underutilization of high-quality reviews in current training strategies. Future work will explore further improvements and the behavioral differences caused by this technique.This paper investigates the effectiveness of experience-aware oversampling in improving the quality of automated code reviews. Modern code review is a critical process for maintaining software quality, but it can be demanding and time-consuming for reviewers. To address this, automated code review models have been developed to mimic human reviewers. However, these models often treat all review examples equally, regardless of the reviewer's experience. The authors propose an approach that focuses on training models using more examples from experienced reviewers, who are more likely to provide deeper insights and more valuable feedback. They use a technique called experience-aware oversampling, where the model is trained with a higher proportion of examples from experienced reviewers, thereby increasing their influence on the model's behavior. The study evaluates the effectiveness of this approach through quantitative and qualitative methods. The results show that the experience-aware oversampling models generate more semantically correct comments, provide more applicable suggestions, and include more explanations compared to the original model. Additionally, the models are better at identifying critical issues such as logic, validation, and resource management. The study concludes that experience-aware oversampling can significantly enhance the quality of automated code reviews without requiring additional data, highlighting the underutilization of high-quality reviews in current training strategies. Future work will explore further improvements and the behavioral differences caused by this technique.

Improving Automated Code Reviews: Learning from Experience

6 Feb 2024 | Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Wachiraphan Charoenwet