**AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context**
**Abstract:**
Fault Localization (FL) is a critical step in software debugging, and recent advancements in Large Language Models (LLMs) have shown promising performance in diagnosing bugs. However, existing LLM-based FL techniques struggle with long contexts, often limited to localizing bugs within small code scopes. To address this, this paper introduces AGENTFL, a multi-agent system based on ChatGPT for automated fault localization. AGENTFL models the FL task as a three-step process: comprehension, navigation, and confirmation, each involving specialized agents with diverse expertise. The system employs auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome challenges in each step. Evaluations on the Defects4J-V1.2.0 benchmark show that AGENTFL can localize 157 out of 395 bugs within Top-1, outperforming other LLM-based approaches and complementing state-of-the-art learning-based techniques. Ablation studies and a user study further validate the effectiveness and usability of AGENTFL, demonstrating its ability to provide suspicious methods and rationales. Cost analysis reveals that AGENTFL is efficient, costing only 0.074 dollars and 97 seconds per bug.
**Keywords:**
Large Language Model, Fault Localization
**Introduction:**
Fault Localization (FL) is a crucial but time-consuming phase in software debugging. Existing techniques, such as spectrum-based and learning-based FL, have shown progress but are limited to small code scopes. LLMs, with their strong code comprehension capabilities, offer a promising solution. However, they struggle with long contexts and critical information in large codebases. AGENTFL decomposes the FL process into three stages—comprehension, navigation, and confirmation—using multiple LLM-driven agents to handle different tasks. The system enhances LLMs with domain knowledge and external components to manage complex debugging tasks effectively.
**Background & Related Work:**
Spectrum-based and learning-based FL techniques have been extensively studied, but they rely heavily on coverage information and code features. LLMs, while powerful, face limitations in handling long contexts and require auxiliary strategies to overcome these challenges.
**AGENTFL Approach:**
AGENTFL is a multi-agent system that localizes buggy methods for an entire project. It consists of four specialized agents: Test Code Reviewer, Source Code Reviewer, Software Architect, and Software Test Engineer. Each agent is enhanced with domain knowledge and external capabilities to perform specific tasks. The system employs strategies like Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to guide the LLMs through the debugging process.
**Evaluation:**
AGENTFL is evaluated on the Defects4J-V1.2.0 benchmark, showing superior performance compared to LLM-based baselines and complementing existing learning-based techniques. Abl**AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context**
**Abstract:**
Fault Localization (FL) is a critical step in software debugging, and recent advancements in Large Language Models (LLMs) have shown promising performance in diagnosing bugs. However, existing LLM-based FL techniques struggle with long contexts, often limited to localizing bugs within small code scopes. To address this, this paper introduces AGENTFL, a multi-agent system based on ChatGPT for automated fault localization. AGENTFL models the FL task as a three-step process: comprehension, navigation, and confirmation, each involving specialized agents with diverse expertise. The system employs auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome challenges in each step. Evaluations on the Defects4J-V1.2.0 benchmark show that AGENTFL can localize 157 out of 395 bugs within Top-1, outperforming other LLM-based approaches and complementing state-of-the-art learning-based techniques. Ablation studies and a user study further validate the effectiveness and usability of AGENTFL, demonstrating its ability to provide suspicious methods and rationales. Cost analysis reveals that AGENTFL is efficient, costing only 0.074 dollars and 97 seconds per bug.
**Keywords:**
Large Language Model, Fault Localization
**Introduction:**
Fault Localization (FL) is a crucial but time-consuming phase in software debugging. Existing techniques, such as spectrum-based and learning-based FL, have shown progress but are limited to small code scopes. LLMs, with their strong code comprehension capabilities, offer a promising solution. However, they struggle with long contexts and critical information in large codebases. AGENTFL decomposes the FL process into three stages—comprehension, navigation, and confirmation—using multiple LLM-driven agents to handle different tasks. The system enhances LLMs with domain knowledge and external components to manage complex debugging tasks effectively.
**Background & Related Work:**
Spectrum-based and learning-based FL techniques have been extensively studied, but they rely heavily on coverage information and code features. LLMs, while powerful, face limitations in handling long contexts and require auxiliary strategies to overcome these challenges.
**AGENTFL Approach:**
AGENTFL is a multi-agent system that localizes buggy methods for an entire project. It consists of four specialized agents: Test Code Reviewer, Source Code Reviewer, Software Architect, and Software Test Engineer. Each agent is enhanced with domain knowledge and external capabilities to perform specific tasks. The system employs strategies like Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to guide the LLMs through the debugging process.
**Evaluation:**
AGENTFL is evaluated on the Defects4J-V1.2.0 benchmark, showing superior performance compared to LLM-based baselines and complementing existing learning-based techniques. Abl