SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

30 May 2024 | John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan
SWE-agent is a system that enables language model (LM) agents to autonomously solve software engineering tasks through a custom agent-computer interface (ACI). The ACI is designed to enhance the agent's ability to create and edit code files, navigate repositories, and execute tests. The system was evaluated on SWE-bench and HumanEvalFix, achieving state-of-the-art performance with pass@1 rates of 12.5% and 87.7%, respectively, significantly outperforming previous non-interactive LMs. The ACI provides a simplified set of actions for file navigation, editing, and searching, along with concise feedback to the agent. It also includes guardrails to prevent common errors and ensure efficient execution. The system was tested with GPT-4 Turbo and Claude 3 Opus, achieving 12.47% and 10.5% success rates on SWE-bench, respectively. The ACI design principles include simplicity, efficiency, and informative feedback, which help LM agents perform better in software engineering tasks. The study shows that tailored ACIs can significantly improve LM agent performance, and that the design of the ACI has a meaningful impact on downstream task performance. The system is open-sourced and provides a framework for evaluating LM agents in software engineering tasks.SWE-agent is a system that enables language model (LM) agents to autonomously solve software engineering tasks through a custom agent-computer interface (ACI). The ACI is designed to enhance the agent's ability to create and edit code files, navigate repositories, and execute tests. The system was evaluated on SWE-bench and HumanEvalFix, achieving state-of-the-art performance with pass@1 rates of 12.5% and 87.7%, respectively, significantly outperforming previous non-interactive LMs. The ACI provides a simplified set of actions for file navigation, editing, and searching, along with concise feedback to the agent. It also includes guardrails to prevent common errors and ensure efficient execution. The system was tested with GPT-4 Turbo and Claude 3 Opus, achieving 12.47% and 10.5% success rates on SWE-bench, respectively. The ACI design principles include simplicity, efficiency, and informative feedback, which help LM agents perform better in software engineering tasks. The study shows that tailored ACIs can significantly improve LM agent performance, and that the design of the ACI has a meaningful impact on downstream task performance. The system is open-sourced and provides a framework for evaluating LM agents in software engineering tasks.
Reach us at info@study.space