Understanding Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis

This paper introduces TELPA, a novel LLM-based test generation technique that enhances the coverage of hard-to-cover branches in software. TELPA addresses two major challenges in test generation: complex object construction and intricate inter-procedural dependencies. To tackle complex object construction, TELPA performs backward method-invocation analysis to extract method invocation sequences that reflect real usage scenarios of the target method. This helps in constructing valid objects and capturing various usage scenarios. To handle inter-procedural dependencies, TELPA conducts forward method-invocation analysis to extract associated methods and incorporate their source code into the prompt, enabling LLMs to understand the semantics of the target branch. TELPA also samples counter-examples to guide LLMs in generating tests that diverge from ineffective ones, improving test diversity and coverage. TELPA integrates program analysis results and counter-examples into the prompt to guide LLMs in generating diverse tests that can reach hard-to-cover branches. The feedback-based test generation process allows for iterative refinement of tests based on coverage results. TELPA is activated when existing test generation tools fail to increase coverage within a predefined timeframe, ensuring cost-effectiveness. The technique was evaluated on 27 open-source Python projects, demonstrating significant improvements in branch coverage compared to state-of-the-art SBST and LLM-based techniques. TELPA achieved an average improvement of 31.39% and 22.22% in branch coverage over Pynguin and CODAMOSA, respectively. The study also confirmed the contribution of each main component in TELPA, including backward and forward method-invocation analysis, counter-example sampling, and coverage-based feedback. TELPA's effectiveness was further validated by comparing it with different configurations, including using a smaller LLM. The results showed that TELPA can significantly improve test coverage regardless of the LLM scale. The study also discussed potential threats to validity, including parameter settings and metrics used. Overall, TELPA demonstrates the potential of combining program analysis with LLM-based test generation to enhance the coverage of hard-to-cover branches in software testing.This paper introduces TELPA, a novel LLM-based test generation technique that enhances the coverage of hard-to-cover branches in software. TELPA addresses two major challenges in test generation: complex object construction and intricate inter-procedural dependencies. To tackle complex object construction, TELPA performs backward method-invocation analysis to extract method invocation sequences that reflect real usage scenarios of the target method. This helps in constructing valid objects and capturing various usage scenarios. To handle inter-procedural dependencies, TELPA conducts forward method-invocation analysis to extract associated methods and incorporate their source code into the prompt, enabling LLMs to understand the semantics of the target branch. TELPA also samples counter-examples to guide LLMs in generating tests that diverge from ineffective ones, improving test diversity and coverage. TELPA integrates program analysis results and counter-examples into the prompt to guide LLMs in generating diverse tests that can reach hard-to-cover branches. The feedback-based test generation process allows for iterative refinement of tests based on coverage results. TELPA is activated when existing test generation tools fail to increase coverage within a predefined timeframe, ensuring cost-effectiveness. The technique was evaluated on 27 open-source Python projects, demonstrating significant improvements in branch coverage compared to state-of-the-art SBST and LLM-based techniques. TELPA achieved an average improvement of 31.39% and 22.22% in branch coverage over Pynguin and CODAMOSA, respectively. The study also confirmed the contribution of each main component in TELPA, including backward and forward method-invocation analysis, counter-example sampling, and coverage-based feedback. TELPA's effectiveness was further validated by comparing it with different configurations, including using a smaller LLM. The results showed that TELPA can significantly improve test coverage regardless of the LLM scale. The study also discussed potential threats to validity, including parameter settings and metrics used. Overall, TELPA demonstrates the potential of combining program analysis with LLM-based test generation to enhance the coverage of hard-to-cover branches in software testing.

Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis

7 Apr 2024 | Chen Yang, Junjie Chen, Bin Lin, Jianyi Zhou, Ziqi Wang