[slides and audio] An Investigation into Misuse of Java Security APIs by Large Language Models

This paper investigates the misuse of security Application Programming Interfaces (APIs) by Large Language Models (LLMs), specifically focusing on ChatGPT's performance in generating secure code for Java. The study addresses two primary research questions: (RQ1) the frequency of security API misuse in ChatGPT-generated code, and (RQ2) the types of security API misuses observed. To conduct this evaluation, the authors compiled 48 programming tasks for five widely used security APIs, including Java Cryptography Architecture (JCA), Java Secure Socket Extension (JSSE), Google OAuth, Biometrics, and Play Integrity. They employed both automated and manual methods to detect security API misuse in the code generated by ChatGPT. The findings reveal that approximately 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Notably, for roughly half of the tasks, the misuse rate reaches 100%, indicating significant challenges in relying on ChatGPT for secure implementation of security API code. The study highlights the need for further research to improve the security of LLM-generated code and raises awareness among developers about the potential risks of using ChatGPT for critical security-sensitive contexts. The paper also discusses the background and related work, methodology, and experimental results, providing a comprehensive overview of the study's approach and findings. The results show that while ChatGPT can generate valid and secure code for some functionalities, it struggles with complex tasks and often makes incorrect API selections or uses deprecated APIs, leading to security vulnerabilities. The study concludes by emphasizing the importance of addressing these issues to ensure the trustworthiness of LLM-generated code in secure software development.This paper investigates the misuse of security Application Programming Interfaces (APIs) by Large Language Models (LLMs), specifically focusing on ChatGPT's performance in generating secure code for Java. The study addresses two primary research questions: (RQ1) the frequency of security API misuse in ChatGPT-generated code, and (RQ2) the types of security API misuses observed. To conduct this evaluation, the authors compiled 48 programming tasks for five widely used security APIs, including Java Cryptography Architecture (JCA), Java Secure Socket Extension (JSSE), Google OAuth, Biometrics, and Play Integrity. They employed both automated and manual methods to detect security API misuse in the code generated by ChatGPT. The findings reveal that approximately 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Notably, for roughly half of the tasks, the misuse rate reaches 100%, indicating significant challenges in relying on ChatGPT for secure implementation of security API code. The study highlights the need for further research to improve the security of LLM-generated code and raises awareness among developers about the potential risks of using ChatGPT for critical security-sensitive contexts. The paper also discusses the background and related work, methodology, and experimental results, providing a comprehensive overview of the study's approach and findings. The results show that while ChatGPT can generate valid and secure code for some functionalities, it struggles with complex tasks and often makes incorrect API selections or uses deprecated APIs, leading to security vulnerabilities. The study concludes by emphasizing the importance of addressing these issues to ensure the trustworthiness of LLM-generated code in secure software development.

An Investigation into Misuse of Java Security APIs by Large Language Models

4 Apr 2024 | Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadba, Muhammad Ali Babar