11 Jun 2024 | Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang
CODER is a multi-agent framework with task graphs designed to resolve GitHub issues. It addresses the challenges of issue resolution by using a structured approach with predefined tasks and roles. The framework includes five agents: Manager, Reproducer, Fault Localizer, Editor, and Verifier. The Manager selects and executes plans, the Reproducer generates tests to reproduce issues, the Fault Localizer identifies code regions causing issues, the Editor performs code changes, and the Verifier checks if the modifications resolve the issue. CODER leverages task graphs to ensure precise execution of plans and improves fault localization by combining spectrum-based fault localization (SBFL) with BM25 retrieval. It also uses a multi-agent approach to handle complex tasks more effectively than single-agent methods. CODER achieves a 28.33% resolution rate on SWE-bench lite, outperforming other methods. The framework's design allows for efficient planning and execution, reducing repetition and ensuring accurate plan execution. CODER's pre-defined plans and roles are based on real-world collaboration, making it effective for issue resolution. The system uses system and instance prompts for each agent to enable LLMs to perform their roles. CODER's results show that pre-planning and structured task graphs improve performance compared to on-the-fly decision-making. The framework also demonstrates the effectiveness of combining LLMs with traditional software engineering strategies for complex tasks.CODER is a multi-agent framework with task graphs designed to resolve GitHub issues. It addresses the challenges of issue resolution by using a structured approach with predefined tasks and roles. The framework includes five agents: Manager, Reproducer, Fault Localizer, Editor, and Verifier. The Manager selects and executes plans, the Reproducer generates tests to reproduce issues, the Fault Localizer identifies code regions causing issues, the Editor performs code changes, and the Verifier checks if the modifications resolve the issue. CODER leverages task graphs to ensure precise execution of plans and improves fault localization by combining spectrum-based fault localization (SBFL) with BM25 retrieval. It also uses a multi-agent approach to handle complex tasks more effectively than single-agent methods. CODER achieves a 28.33% resolution rate on SWE-bench lite, outperforming other methods. The framework's design allows for efficient planning and execution, reducing repetition and ensuring accurate plan execution. CODER's pre-defined plans and roles are based on real-world collaboration, making it effective for issue resolution. The system uses system and instance prompts for each agent to enable LLMs to perform their roles. CODER's results show that pre-planning and structured task graphs improve performance compared to on-the-fly decision-making. The framework also demonstrates the effectiveness of combining LLMs with traditional software engineering strategies for complex tasks.