LLM-Powered Test Case Generation for Detecting Tricky Bugs

LLM-Powered Test Case Generation for Detecting Tricky Bugs

24 April 2024 | Kaibo Liu, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Yudong Han, Yun Ma, Ge Li, and Gang Huang
This paper proposes AID, an automated test case generation method for detecting tricky bugs in plausibly correct programs. AID combines large language models (LLMs) with differential testing to generate both test inputs and test oracles. The method consists of three steps: program variants generation, input generation, and differential testing. In the first two steps, LLMs are used to generate program variants and test inputs. In the third step, differential testing is used to identify inconsistencies in program outputs. AID evaluates on two large-scale datasets, TrickyBugs and EvalPlus, and outperforms three state-of-the-art baselines in terms of recall, precision, and F1 score. The results show that AID achieves the best performance in detecting tricky bugs in both human-written and AI-generated programs. The method's contributions include a novel LLM-powered test oracle generation approach, an extensive evaluation, and a replication package. The paper also discusses the threats to validity and concludes that AID is a promising approach for automated test case generation.This paper proposes AID, an automated test case generation method for detecting tricky bugs in plausibly correct programs. AID combines large language models (LLMs) with differential testing to generate both test inputs and test oracles. The method consists of three steps: program variants generation, input generation, and differential testing. In the first two steps, LLMs are used to generate program variants and test inputs. In the third step, differential testing is used to identify inconsistencies in program outputs. AID evaluates on two large-scale datasets, TrickyBugs and EvalPlus, and outperforms three state-of-the-art baselines in terms of recall, precision, and F1 score. The results show that AID achieves the best performance in detecting tricky bugs in both human-written and AI-generated programs. The method's contributions include a novel LLM-powered test oracle generation approach, an extensive evaluation, and a replication package. The paper also discusses the threats to validity and concludes that AID is a promising approach for automated test case generation.
Reach us at info@study.space
[slides and audio] LLM-Powered Test Case Generation for Detecting Tricky Bugs