An Investigation into Misuse of Java Security APIs by Large Language Models

An Investigation into Misuse of Java Security APIs by Large Language Models

2024 | Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadba, Muhammad Ali Babar
This paper investigates the misuse of Java security APIs by Large Language Models (LLMs), specifically ChatGPT, in generating code. The study evaluates the trustworthiness of ChatGPT in generating secure code for five widely used Java security APIs: Java Cryptography Architecture (JCA), Java Secure Socket Extension (JSSE), Google OAuth, Biometrics, and Play Integrity. The research aims to address two research questions: (RQ1) How often does ChatGPT generate code containing security API misuse? (RQ2) What types of security API misuses are observed in code generated by ChatGPT? The study involves creating 48 programming tasks for the five security APIs, which are then used as prompts to query ChatGPT. The responses are preprocessed to extract valid Java code, and then analyzed for security API misuse. The analysis reveals that around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. For roughly half of the tasks, the misuse rate reaches 100%, indicating that developers cannot yet rely on ChatGPT to securely implement security API code. The study identifies various types of security API misuses, including constant or predictable cryptographic keys, insecure modes of operation for encryption, predictable initialization vectors in CBC mode, short cryptographic keys, constant or predictable seeds for PRNG, constant or predictable salts for key derivation, insufficient number of iterations for key derivation, hardcoded constant passwords, and broken hash functions. These misuses pose significant security risks, such as data breaches and unauthorized access to user information. The findings highlight the need for further research to improve the security of LLM-generated code, particularly in the context of security APIs. The study provides a comprehensive evaluation framework for systematically analyzing security API misuse in code generated by LLMs, contributing to the understanding of the security implications of AI-generated code.This paper investigates the misuse of Java security APIs by Large Language Models (LLMs), specifically ChatGPT, in generating code. The study evaluates the trustworthiness of ChatGPT in generating secure code for five widely used Java security APIs: Java Cryptography Architecture (JCA), Java Secure Socket Extension (JSSE), Google OAuth, Biometrics, and Play Integrity. The research aims to address two research questions: (RQ1) How often does ChatGPT generate code containing security API misuse? (RQ2) What types of security API misuses are observed in code generated by ChatGPT? The study involves creating 48 programming tasks for the five security APIs, which are then used as prompts to query ChatGPT. The responses are preprocessed to extract valid Java code, and then analyzed for security API misuse. The analysis reveals that around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. For roughly half of the tasks, the misuse rate reaches 100%, indicating that developers cannot yet rely on ChatGPT to securely implement security API code. The study identifies various types of security API misuses, including constant or predictable cryptographic keys, insecure modes of operation for encryption, predictable initialization vectors in CBC mode, short cryptographic keys, constant or predictable seeds for PRNG, constant or predictable salts for key derivation, insufficient number of iterations for key derivation, hardcoded constant passwords, and broken hash functions. These misuses pose significant security risks, such as data breaches and unauthorized access to user information. The findings highlight the need for further research to improve the security of LLM-generated code, particularly in the context of security APIs. The study provides a comprehensive evaluation framework for systematically analyzing security API misuse in code generated by LLMs, contributing to the understanding of the security implications of AI-generated code.
Reach us at info@study.space