November 15–19, 2024 | Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang
Meta has developed TestGen-LLM, an automated unit test improvement tool that uses large language models (LLMs) to enhance existing human-written tests. The tool ensures generated test cases meet specific filters to guarantee measurable improvement over the original test suite, reducing the risk of LLM hallucination. TestGen-LLM was deployed at Meta's test-a-thons for Instagram and Facebook, where it improved 11.5% of test classes, with 73% of its recommendations accepted for production deployment. In an evaluation on Instagram's Reels and Stories products, 75% of test cases built correctly, 57% passed reliably, and 25% increased coverage. The tool uses a filtration process to discard test cases that cannot be guaranteed to meet improvement assurances. TestGen-LLM also employs an ensemble approach, combining multiple LLMs, prompts, and hyperparameters to generate test cases that improve coverage and ensure non-regression. The tool's approach is based on Assured LLM-based Software Engineering (Assured LLMSE), which provides verifiable guarantees for code improvement. TestGen-LLM was deployed in a gradual, incremental manner, starting with initial trials and evolving into a fully automated tool. The tool's performance was evaluated across multiple test-a-thons, with results showing that it improved 10% of test classes and 73% of test improvements were accepted by developers. The tool's success highlights the potential of LLMs in industrial-scale software testing and development.Meta has developed TestGen-LLM, an automated unit test improvement tool that uses large language models (LLMs) to enhance existing human-written tests. The tool ensures generated test cases meet specific filters to guarantee measurable improvement over the original test suite, reducing the risk of LLM hallucination. TestGen-LLM was deployed at Meta's test-a-thons for Instagram and Facebook, where it improved 11.5% of test classes, with 73% of its recommendations accepted for production deployment. In an evaluation on Instagram's Reels and Stories products, 75% of test cases built correctly, 57% passed reliably, and 25% increased coverage. The tool uses a filtration process to discard test cases that cannot be guaranteed to meet improvement assurances. TestGen-LLM also employs an ensemble approach, combining multiple LLMs, prompts, and hyperparameters to generate test cases that improve coverage and ensure non-regression. The tool's approach is based on Assured LLM-based Software Engineering (Assured LLMSE), which provides verifiable guarantees for code improvement. TestGen-LLM was deployed in a gradual, incremental manner, starting with initial trials and evolving into a fully automated tool. The tool's performance was evaluated across multiple test-a-thons, with results showing that it improved 10% of test classes and 73% of test improvements were accepted by developers. The tool's success highlights the potential of LLMs in industrial-scale software testing and development.