AutoCodeRover: Autonomous Program Improvement

AutoCodeRover: Autonomous Program Improvement

September 16-20, 2024 | Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, Abhik Roychoudhury
AutoCodeRover: Autonomous Program Improvement AutoCodeRover is an automated approach for solving GitHub issues to achieve program improvement. It combines large language models (LLMs) with sophisticated code search capabilities to generate program modifications or patches. Unlike recent LLM agent approaches, AutoCodeRover is software engineering-oriented, working with program representations like abstract syntax trees (ASTs) rather than viewing software projects as mere collections of files. The code search exploits program structure in the form of classes/methods to enhance LLM understanding of the issue's root cause and effectively retrieve context via iterative search. Spectrum-based fault localization using tests further sharpens the context when available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), higher than SWE-AGENT. AutoCodeRover resolved 57 GitHub issues in about 4 minutes each (pass@1), while developers spent more than 2.68 days on average. It achieved this efficacy with significantly lower cost ($0.43 USD on average). The workflow enables autonomous software engineering, where auto-generated code from LLMs can be autonomously improved. AutoCodeRover works by first analyzing the attached natural language description to extract keywords that may represent files/classes/methods/code snippets in the codebase. These keywords are used to invoke multiple necessary code search APIs at one time with the keyword combinations as arguments (e.g., search_method_in_file). Code search APIs are running locally based on AST analysis and are responsible for retrieving code context such as class signatures and method implementation details from a particular location in the codebase. By collecting the project context with code search APIs, LLM refines its understanding of the issue based on the currently available context. The LLM agent directs the navigation and decides which code search APIs to use (i.e., where/what to retrieve code) in each iteration based on the current available context returned from the previous API calls. AutoCodeRover then enquires whether there is sufficient project context, and subsequently uses the collected context to derive the buggy locations. The patch construction is then handled by another LLM agent which considers the buggy locations as well as all the context collected so far for those locations. AUTOCODEROVER also can leverage debugging techniques such as spectrum-based fault localization (SBFL) to decide more precise code search APIs for context retrieval if a test suite accompanying the project is available. SBFL primarily considers the control flow of the passing and failing tests and assigns a suspiciousness score to the different methods of the program. The LLM agent may prioritize retrieving context from particular methods and classes if the fault localization result is provided, e.g., when a method appears both in the issue description and in the output of fault localization. In the last step, AUTOCODEROVER may perform patch validation using available tests, to determine whether the patch produced byAutoCodeRover: Autonomous Program Improvement AutoCodeRover is an automated approach for solving GitHub issues to achieve program improvement. It combines large language models (LLMs) with sophisticated code search capabilities to generate program modifications or patches. Unlike recent LLM agent approaches, AutoCodeRover is software engineering-oriented, working with program representations like abstract syntax trees (ASTs) rather than viewing software projects as mere collections of files. The code search exploits program structure in the form of classes/methods to enhance LLM understanding of the issue's root cause and effectively retrieve context via iterative search. Spectrum-based fault localization using tests further sharpens the context when available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), higher than SWE-AGENT. AutoCodeRover resolved 57 GitHub issues in about 4 minutes each (pass@1), while developers spent more than 2.68 days on average. It achieved this efficacy with significantly lower cost ($0.43 USD on average). The workflow enables autonomous software engineering, where auto-generated code from LLMs can be autonomously improved. AutoCodeRover works by first analyzing the attached natural language description to extract keywords that may represent files/classes/methods/code snippets in the codebase. These keywords are used to invoke multiple necessary code search APIs at one time with the keyword combinations as arguments (e.g., search_method_in_file). Code search APIs are running locally based on AST analysis and are responsible for retrieving code context such as class signatures and method implementation details from a particular location in the codebase. By collecting the project context with code search APIs, LLM refines its understanding of the issue based on the currently available context. The LLM agent directs the navigation and decides which code search APIs to use (i.e., where/what to retrieve code) in each iteration based on the current available context returned from the previous API calls. AutoCodeRover then enquires whether there is sufficient project context, and subsequently uses the collected context to derive the buggy locations. The patch construction is then handled by another LLM agent which considers the buggy locations as well as all the context collected so far for those locations. AUTOCODEROVER also can leverage debugging techniques such as spectrum-based fault localization (SBFL) to decide more precise code search APIs for context retrieval if a test suite accompanying the project is available. SBFL primarily considers the control flow of the passing and failing tests and assigns a suspiciousness score to the different methods of the program. The LLM agent may prioritize retrieving context from particular methods and classes if the fault localization result is provided, e.g., when a method appears both in the issue description and in the output of fault localization. In the last step, AUTOCODEROVER may perform patch validation using available tests, to determine whether the patch produced by
Reach us at info@study.space