The paper introduces MASAI (Modular Architecture for Software-engineering AI Agents), a modular architecture designed to solve complex software engineering problems. MASAI consists of multiple sub-agents, each with a well-defined objective and strategy, that work together to resolve issues in software repositories. The sub-agents include a Test Template Generator, Issue Reproducer, Edit Localizer, Fixer, and Ranker, each handling specific aspects of the problem-solving process. The modular design allows for the employment of different problem-solving strategies, information gathering from various sources, and efficient trajectory management, which helps avoid unnecessary costs and context inflation.
MASAI was evaluated on the SWE-bench Lite dataset, which consists of 300 GitHub issues from 11 Python repositories. The results show that MASAI achieved the highest resolution rate of 28.33%, outperforming other methods in terms of both resolution and localization rates. The paper also discusses the design choices and their impact on the effectiveness of MASAI, providing insights into the performance of different sub-agents and the overall architecture.
The contributions of the paper include proposing MASAI, demonstrating its effectiveness on the SWE-bench Lite dataset, and conducting a thorough analysis of the design decisions and their impact. The authors also contribute their results to the SWE-bench Lite leaderboard and discuss the broader implications and limitations of their work.The paper introduces MASAI (Modular Architecture for Software-engineering AI Agents), a modular architecture designed to solve complex software engineering problems. MASAI consists of multiple sub-agents, each with a well-defined objective and strategy, that work together to resolve issues in software repositories. The sub-agents include a Test Template Generator, Issue Reproducer, Edit Localizer, Fixer, and Ranker, each handling specific aspects of the problem-solving process. The modular design allows for the employment of different problem-solving strategies, information gathering from various sources, and efficient trajectory management, which helps avoid unnecessary costs and context inflation.
MASAI was evaluated on the SWE-bench Lite dataset, which consists of 300 GitHub issues from 11 Python repositories. The results show that MASAI achieved the highest resolution rate of 28.33%, outperforming other methods in terms of both resolution and localization rates. The paper also discusses the design choices and their impact on the effectiveness of MASAI, providing insights into the performance of different sub-agents and the overall architecture.
The contributions of the paper include proposing MASAI, demonstrating its effectiveness on the SWE-bench Lite dataset, and conducting a thorough analysis of the design decisions and their impact. The authors also contribute their results to the SWE-bench Lite leaderboard and discuss the broader implications and limitations of their work.