An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs

An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs

26 January 2024 | Mingxing Liu, Junfeng Wang, Tao Lin, Quan Ma, Zhiyang Fang and Yanqun Wu
An empirical study explores the use of large language models (LLMs) like GPT-4 for generating safety-critical software code. The study addresses the challenge of improving development efficiency in domains such as nuclear energy and the automotive industry, where safety-critical software is essential. The research investigates different approaches for code generation, including code generation based on overall requirements, specific requirements, and augmented prompts. A novel prompt engineering method called Prompt-FDC is proposed, which integrates basic functional requirements, domain feature generalization, and domain constraints. This method significantly improves code completeness, increasing it from 30% to 100%, and enhances code comment rate to 26.3%. The study also introduces a new software development process and V-model lifecycle for safety-critical software. Through systematic case studies, the research demonstrates that with appropriate prompt methods, LLMs can auto-generate safety-critical software code that meets practical engineering application requirements. The study provides two specific examples of safety-critical software from the industrial domain, which can be used by other researchers to explore further code generation methods. The results show that while LLMs can generate code based on overall requirements, they often fail to meet all functional requirements. However, with augmented prompts, the quality of generated code improves significantly, meeting industry standards and enhancing code compliance, readability, and maintainability. The study highlights the importance of prompt engineering in code generation for safety-critical software and proposes a new framework called Prompt-FDC that integrates basic functional requirements, domain feature generalization, and domain constraints. The study also compares the code generation process of LLMs with the widely used SCADE software, showing that the LLM-based approach has distinct advantages in terms of learning curves and software development efficiency. The results indicate that the augmented prompt method achieves the best domain code generation results, with improved code completeness, correctness, and code comment rate. The study concludes that LLMs can be applied to various engineering domains to improve software safety and development efficiency.An empirical study explores the use of large language models (LLMs) like GPT-4 for generating safety-critical software code. The study addresses the challenge of improving development efficiency in domains such as nuclear energy and the automotive industry, where safety-critical software is essential. The research investigates different approaches for code generation, including code generation based on overall requirements, specific requirements, and augmented prompts. A novel prompt engineering method called Prompt-FDC is proposed, which integrates basic functional requirements, domain feature generalization, and domain constraints. This method significantly improves code completeness, increasing it from 30% to 100%, and enhances code comment rate to 26.3%. The study also introduces a new software development process and V-model lifecycle for safety-critical software. Through systematic case studies, the research demonstrates that with appropriate prompt methods, LLMs can auto-generate safety-critical software code that meets practical engineering application requirements. The study provides two specific examples of safety-critical software from the industrial domain, which can be used by other researchers to explore further code generation methods. The results show that while LLMs can generate code based on overall requirements, they often fail to meet all functional requirements. However, with augmented prompts, the quality of generated code improves significantly, meeting industry standards and enhancing code compliance, readability, and maintainability. The study highlights the importance of prompt engineering in code generation for safety-critical software and proposes a new framework called Prompt-FDC that integrates basic functional requirements, domain feature generalization, and domain constraints. The study also compares the code generation process of LLMs with the widely used SCADE software, showing that the LLM-based approach has distinct advantages in terms of learning curves and software development efficiency. The results indicate that the augmented prompt method achieves the best domain code generation results, with improved code completeness, correctness, and code comment rate. The study concludes that LLMs can be applied to various engineering domains to improve software safety and development efficiency.
Reach us at info@study.space