26 Jun 2024 | Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen
An empirical study on unit test generation using large language models (LLMs) explores the effectiveness of open-source LLMs in generating unit tests for Java projects. The study evaluates five open-source LLMs, including CodeLlama and DeepSeekCoder, alongside the commercial GPT-4 and traditional Evosuite. The research investigates how different prompt designs, in-context learning methods, and code features influence the effectiveness of LLM-based unit test generation. Key findings include the importance of prompt design, the impact of code features on test coverage, and the limitations of LLMs in generating syntactically valid and defect-detecting unit tests. The study highlights that while open-source LLMs show promise, they still face challenges in generating high-quality unit tests due to hallucination issues. The results suggest that effective prompting and careful selection of code features are crucial for improving LLM-based unit test generation. The study also identifies the need for further research into optimizing LLMs for unit test generation, particularly in addressing the limitations of current approaches. Overall, the study provides actionable insights for future research and practical applications of LLMs in unit test generation.An empirical study on unit test generation using large language models (LLMs) explores the effectiveness of open-source LLMs in generating unit tests for Java projects. The study evaluates five open-source LLMs, including CodeLlama and DeepSeekCoder, alongside the commercial GPT-4 and traditional Evosuite. The research investigates how different prompt designs, in-context learning methods, and code features influence the effectiveness of LLM-based unit test generation. Key findings include the importance of prompt design, the impact of code features on test coverage, and the limitations of LLMs in generating syntactically valid and defect-detecting unit tests. The study highlights that while open-source LLMs show promise, they still face challenges in generating high-quality unit tests due to hallucination issues. The results suggest that effective prompting and careful selection of code features are crucial for improving LLM-based unit test generation. The study also identifies the need for further research into optimizing LLMs for unit test generation, particularly in addressing the limitations of current approaches. Overall, the study provides actionable insights for future research and practical applications of LLMs in unit test generation.