LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities and Limitations

LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities and Limitations

21 May 2024 | Rebeka Tóth, Tamas Bisztray, and László Erdődi
This study evaluates the security of web application code generated by Large Language Models (LLMs), analyzing 2,500 GPT-4 generated PHP websites. These were deployed in Docker containers and tested for vulnerabilities using a hybrid approach of Burp Suite active scanning, static analysis, and manual review. The investigation focuses on identifying Insecure File Upload, SQL Injection, Stored XSS, and Reflected XSS in GPT-4 generated PHP code. The analysis highlights potential security risks and the implications of deploying such code in real-world scenarios. Overall, 2,440 vulnerable parameters were found. According to Burp's Scan, 11.56% of the sites can be compromised. Adding static scan results, 26% had at least one vulnerability that can be exploited through web interaction. Certain coding scenarios, like file upload functionality, are insecure 78% of the time, underscoring significant risks to software safety and security. The dataset and vulnerability records are publicly available on GitHub. The study emphasizes the need for thorough testing and evaluation when using generative AI in software development. The research focuses on the security implications of LLM-generated code, particularly in PHP. The study found that GPT-4 generated code has significant security vulnerabilities, including SQL injection, XSS, and insecure file uploads. The results show that 11.16% of the sites are vulnerable, with 78% of sites with file upload functionality being insecure. The study also found that 54% of sites using SQL queries lacked prepared statements, leaving them open to SQL injection attacks. The study concludes that GPT-4 is highly susceptible to generating PHP code containing vulnerabilities. The research highlights the need for careful evaluation and testing of LLM-generated code to ensure security and safety in web development. The study also discusses the limitations of automated tools and the importance of human oversight and code review in identifying vulnerabilities. The findings underscore the critical role of security in web development and the need for rigorous safeguards when using LLM-generated code.This study evaluates the security of web application code generated by Large Language Models (LLMs), analyzing 2,500 GPT-4 generated PHP websites. These were deployed in Docker containers and tested for vulnerabilities using a hybrid approach of Burp Suite active scanning, static analysis, and manual review. The investigation focuses on identifying Insecure File Upload, SQL Injection, Stored XSS, and Reflected XSS in GPT-4 generated PHP code. The analysis highlights potential security risks and the implications of deploying such code in real-world scenarios. Overall, 2,440 vulnerable parameters were found. According to Burp's Scan, 11.56% of the sites can be compromised. Adding static scan results, 26% had at least one vulnerability that can be exploited through web interaction. Certain coding scenarios, like file upload functionality, are insecure 78% of the time, underscoring significant risks to software safety and security. The dataset and vulnerability records are publicly available on GitHub. The study emphasizes the need for thorough testing and evaluation when using generative AI in software development. The research focuses on the security implications of LLM-generated code, particularly in PHP. The study found that GPT-4 generated code has significant security vulnerabilities, including SQL injection, XSS, and insecure file uploads. The results show that 11.16% of the sites are vulnerable, with 78% of sites with file upload functionality being insecure. The study also found that 54% of sites using SQL queries lacked prepared statements, leaving them open to SQL injection attacks. The study concludes that GPT-4 is highly susceptible to generating PHP code containing vulnerabilities. The research highlights the need for careful evaluation and testing of LLM-generated code to ensure security and safety in web development. The study also discusses the limitations of automated tools and the importance of human oversight and code review in identifying vulnerabilities. The findings underscore the critical role of security in web development and the need for rigorous safeguards when using LLM-generated code.
Reach us at info@study.space
[slides] LLMs in Web-Development%3A Evaluating LLM-Generated PHP code unveiling vulnerabilities and limitations | StudySpace