24 Mar 2024 | Lezhi Ma*, Shangqing Liu†, Yi Li‡, Xiaofei Xie§, Lei Bu†
SpecGen is an innovative technique for generating formal program specifications using Large Language Models (LLMs). The approach aims to overcome the limitations of existing methods, which often rely on predefined templates or grammars, by leveraging the code comprehension capabilities of LLMs. SpecGen consists of two phases: a conversational approach to guide the LLM in generating specifications and a mutation-based approach to refine and verify the generated specifications. The conversational phase uses a few-shot learning approach with verification failure information to improve the quality of generated specifications. The mutation-based phase employs four mutation operators to modify the LLM's output and select verifiable specifications through a heuristic selection strategy. Experimental results on two datasets, including the SV-COMP Java category benchmark and a manually constructed dataset, demonstrate that SpecGen outperforms existing methods in generating verifiable specifications, with 279 out of 385 programs successfully handled. The quality of the generated specifications is further evaluated through user studies, showing that SpecGen can accurately and comprehensively describe the semantics of the input programs.SpecGen is an innovative technique for generating formal program specifications using Large Language Models (LLMs). The approach aims to overcome the limitations of existing methods, which often rely on predefined templates or grammars, by leveraging the code comprehension capabilities of LLMs. SpecGen consists of two phases: a conversational approach to guide the LLM in generating specifications and a mutation-based approach to refine and verify the generated specifications. The conversational phase uses a few-shot learning approach with verification failure information to improve the quality of generated specifications. The mutation-based phase employs four mutation operators to modify the LLM's output and select verifiable specifications through a heuristic selection strategy. Experimental results on two datasets, including the SV-COMP Java category benchmark and a manually constructed dataset, demonstrate that SpecGen outperforms existing methods in generating verifiable specifications, with 279 out of 385 programs successfully handled. The quality of the generated specifications is further evaluated through user studies, showing that SpecGen can accurately and comprehensively describe the semantics of the input programs.