April 15–16, 2024, Lisbon, Portugal | André Silva*, Nuno Saavedra*, Martin Monperrus
**GitBug-JAVA: A Reproducible Benchmark of Recent Java Bugs**
**Authors:** André Silva, Nuno Saavedra, Martin Monperrus
**Abstract:**
Bug-fix benchmarks are crucial for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, such as Defects4J, need to incorporate recent bug-fixes and ensure reproducibility. To address these gaps, the authors present GitBug-JAVA, a reproducible benchmark of recent Java bugs. GitBug-JAVA features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories. The methodology ensures the preservation of bug-fixes in fully reproducible environments. GitBug-JAVA is published on GitHub at https://github.com/gitbugactions/gitbug-java.
**Keywords:** bug-fix benchmark, Java, reproducibility, automatic program repair, fault localization
**Contributions:**
- A sound methodology for constructing a reproducible bug-fix benchmark.
- GitBug-JAVA, a reproducible benchmark of recent Java bugs containing 199 bugs from 55 repositories.
- A publicly available repository on GitHub with proper documentation and a visualization companion website.
**Building GitBug-JAVA:**
- **Finding Locally Executable Repositories:** Select repositories based on popularity, size, and active maintenance.
- **Selecting Locally Executable Bug-Fixes:** Identify bug-fixes with specific patterns and test execution results.
- **Exporting Reproduction Environments:** Build offline reproduction environments to ensure long-term reproducibility.
- **Checking Flakiness and Offline Reproduction:** Verify that bug-fixes can be reproduced offline and exclude flaky tests.
**Insights into GitBug-JAVA:**
- **Repositories:** The benchmark includes a diverse range of relevant open-source repositories.
- **Bug-Fixes:** The median patch contains changes in a single file, two hunks, and 9 lines. The median number of tests per bug-fix is 497.
- **Reproduction Environments:** The benchmark provides offline reproduction environments for each bug-fix, ensuring long-term reproducibility.
**Related Work:**
- Several bug-fix benchmarks have been proposed, but many face challenges such as lack of reproducibility or relevance to real-world software systems.
- GitBug-JAVA stands out by including fully reproducible test-suite based bug-fixes from real-world projects.
**Conclusion:**
GitBug-JAVA is a reproducible benchmark of recent Java bugs, ensuring relevance to current development practices and long-term reproducibility. It is publicly available and provides valuable resources for future research in APR and FL.**GitBug-JAVA: A Reproducible Benchmark of Recent Java Bugs**
**Authors:** André Silva, Nuno Saavedra, Martin Monperrus
**Abstract:**
Bug-fix benchmarks are crucial for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, such as Defects4J, need to incorporate recent bug-fixes and ensure reproducibility. To address these gaps, the authors present GitBug-JAVA, a reproducible benchmark of recent Java bugs. GitBug-JAVA features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories. The methodology ensures the preservation of bug-fixes in fully reproducible environments. GitBug-JAVA is published on GitHub at https://github.com/gitbugactions/gitbug-java.
**Keywords:** bug-fix benchmark, Java, reproducibility, automatic program repair, fault localization
**Contributions:**
- A sound methodology for constructing a reproducible bug-fix benchmark.
- GitBug-JAVA, a reproducible benchmark of recent Java bugs containing 199 bugs from 55 repositories.
- A publicly available repository on GitHub with proper documentation and a visualization companion website.
**Building GitBug-JAVA:**
- **Finding Locally Executable Repositories:** Select repositories based on popularity, size, and active maintenance.
- **Selecting Locally Executable Bug-Fixes:** Identify bug-fixes with specific patterns and test execution results.
- **Exporting Reproduction Environments:** Build offline reproduction environments to ensure long-term reproducibility.
- **Checking Flakiness and Offline Reproduction:** Verify that bug-fixes can be reproduced offline and exclude flaky tests.
**Insights into GitBug-JAVA:**
- **Repositories:** The benchmark includes a diverse range of relevant open-source repositories.
- **Bug-Fixes:** The median patch contains changes in a single file, two hunks, and 9 lines. The median number of tests per bug-fix is 497.
- **Reproduction Environments:** The benchmark provides offline reproduction environments for each bug-fix, ensuring long-term reproducibility.
**Related Work:**
- Several bug-fix benchmarks have been proposed, but many face challenges such as lack of reproducibility or relevance to real-world software systems.
- GitBug-JAVA stands out by including fully reproducible test-suite based bug-fixes from real-world projects.
**Conclusion:**
GitBug-JAVA is a reproducible benchmark of recent Java bugs, ensuring relevance to current development practices and long-term reproducibility. It is publicly available and provides valuable resources for future research in APR and FL.