Resolving Code Review Comments with Machine Learning

Resolving Code Review Comments with Machine Learning

April 14-20, 2024 | Alexander Frömmgen, Jacob Austin, Peter Choy, Nimesh Ghalani, Lera Kharatyan, Gabriela Surita, Elena Khrapko, Pascal Lamblin, Pierre-Antoine Manzagol, Marcus Revaj, Maxim Tabachnyk, Daniel Tarlow, Kevin Villela, Daniel Zheng, Satish Chandra, Petros Maniatis
This paper presents a system that uses machine learning (ML) to automatically resolve code-review comments in Google's software development workflow. Code reviews are a critical part of the software development process, taking significant time from both code authors and reviewers. At Google, millions of reviewer comments are generated annually, and authors spend an average of 60 minutes actively addressing these comments between submitting changes for review and final submission. The time required to address these comments grows almost linearly with the number of comments. However, with ML, there is an opportunity to automate and streamline the code-review process, such as by proposing code changes based on a comment's text. The authors describe their application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. They present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers' productivity and allows them to focus on more creative and complex tasks. The system is built as an ML-based assistant that resolves code-review comments left by human reviewers. The assistant suggests a code edit that addresses the reviewer's comment. The system was built, tuned, and deployed, and has been in production use at Google for several months. The system's contributions include the careful curation of a training dataset drawn from tens of millions of code reviews, lessons learned about tuning the model, the resulting assistant design, and the user-interface experience of the assistant. Qualitative and quantitative results about the positive impact of the deployed assistant are also presented. The system is designed to improve the ability of reviewers to give more actionable, precise suggestions and the ability of code authors to address such suggestions effectively. The assistant is used by thousands of Google engineers every day and has been shown to significantly improve productivity. The system's performance is measured through offline evaluation and user feedback, with the goal of improving the quality of the assistant and its usability. The system has been deployed to a large portion of Google's population, with the latest version, V2, showing a 7.5% resolution rate for code-review comments. The system has also been shown to improve the quality of suggested edits, with the assistant being able to generate detailed rewrites that address a wide range of reviewer comments. The system has been well-received by Google engineers, with enthusiastic feedback and positive impact on productivity.This paper presents a system that uses machine learning (ML) to automatically resolve code-review comments in Google's software development workflow. Code reviews are a critical part of the software development process, taking significant time from both code authors and reviewers. At Google, millions of reviewer comments are generated annually, and authors spend an average of 60 minutes actively addressing these comments between submitting changes for review and final submission. The time required to address these comments grows almost linearly with the number of comments. However, with ML, there is an opportunity to automate and streamline the code-review process, such as by proposing code changes based on a comment's text. The authors describe their application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. They present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers' productivity and allows them to focus on more creative and complex tasks. The system is built as an ML-based assistant that resolves code-review comments left by human reviewers. The assistant suggests a code edit that addresses the reviewer's comment. The system was built, tuned, and deployed, and has been in production use at Google for several months. The system's contributions include the careful curation of a training dataset drawn from tens of millions of code reviews, lessons learned about tuning the model, the resulting assistant design, and the user-interface experience of the assistant. Qualitative and quantitative results about the positive impact of the deployed assistant are also presented. The system is designed to improve the ability of reviewers to give more actionable, precise suggestions and the ability of code authors to address such suggestions effectively. The assistant is used by thousands of Google engineers every day and has been shown to significantly improve productivity. The system's performance is measured through offline evaluation and user feedback, with the goal of improving the quality of the assistant and its usability. The system has been deployed to a large portion of Google's population, with the latest version, V2, showing a 7.5% resolution rate for code-review comments. The system has also been shown to improve the quality of suggested edits, with the assistant being able to generate detailed rewrites that address a wide range of reviewer comments. The system has been well-received by Google engineers, with enthusiastic feedback and positive impact on productivity.
Reach us at info@study.space