LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

27 May 2024 | Ziyang Li, Saikat Dutta, Mayur Naik
The paper introduces IRIS, a novel approach that combines large language models (LLMs) with static analysis to detect security vulnerabilities in Java projects. IRIS addresses the limitations of traditional static taint analysis tools, which often rely on manual specifications and suffer from false positives and false negatives. The authors curate a dataset called CWE-Bench-Java, containing 120 manually validated security vulnerabilities in real-world Java projects, with an average of 300,000 lines of code per project. IRIS uses LLMs to infer taint specifications for third-party library APIs, enhancing the static analysis tool CodeQL to detect vulnerabilities more effectively. The evaluation shows that IRIS detects 69 vulnerabilities using GPT-4, significantly more than the 27 detected by CodeQL. Additionally, IRIS reduces false positives by more than 80%. The paper also discusses the effectiveness of contextual analysis in reducing false alarms and the quality of inferred source and sink specifications. Overall, IRIS demonstrates the potential of combining LLMs with static analysis to improve the detection of security vulnerabilities in complex codebases.The paper introduces IRIS, a novel approach that combines large language models (LLMs) with static analysis to detect security vulnerabilities in Java projects. IRIS addresses the limitations of traditional static taint analysis tools, which often rely on manual specifications and suffer from false positives and false negatives. The authors curate a dataset called CWE-Bench-Java, containing 120 manually validated security vulnerabilities in real-world Java projects, with an average of 300,000 lines of code per project. IRIS uses LLMs to infer taint specifications for third-party library APIs, enhancing the static analysis tool CodeQL to detect vulnerabilities more effectively. The evaluation shows that IRIS detects 69 vulnerabilities using GPT-4, significantly more than the 27 detected by CodeQL. Additionally, IRIS reduces false positives by more than 80%. The paper also discusses the effectiveness of contextual analysis in reducing false alarms and the quality of inferred source and sink specifications. Overall, IRIS demonstrates the potential of combining LLMs with static analysis to improve the detection of security vulnerabilities in complex codebases.
Reach us at info@study.space