24 Mar 2024 | Juan Altmayer Pizzorno, Emery D. Berger
This paper introduces COVERUP, a novel system that generates high-coverage Python regression tests using a combination of coverage analysis and large language models (LLMs). COVERUP iteratively improves coverage by interleaving coverage analysis with dialogs with the LLM, focusing its attention on uncovered lines and branches. Compared to CODAMOSA, a hybrid LLM/search-based software testing system, COVERUP significantly enhances coverage metrics, achieving median line coverage of 81%, branch coverage of 53%, and line-branch coverage of 78%. The paper demonstrates that COVERUP's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly half of its successes. The evaluation shows that COVERUP outperforms CODAMOSA both in terms of overall coverage and per-module coverage, and that the iterative dialogue with the LLM is effective, contributing to almost half of the successful test generations.This paper introduces COVERUP, a novel system that generates high-coverage Python regression tests using a combination of coverage analysis and large language models (LLMs). COVERUP iteratively improves coverage by interleaving coverage analysis with dialogs with the LLM, focusing its attention on uncovered lines and branches. Compared to CODAMOSA, a hybrid LLM/search-based software testing system, COVERUP significantly enhances coverage metrics, achieving median line coverage of 81%, branch coverage of 53%, and line-branch coverage of 78%. The paper demonstrates that COVERUP's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly half of its successes. The evaluation shows that COVERUP outperforms CODAMOSA both in terms of overall coverage and per-module coverage, and that the iterative dialogue with the LLM is effective, contributing to almost half of the successful test generations.