This paper introduces IRIS, a novel neuro-symbolic approach that combines large language models (LLMs) with static analysis to detect security vulnerabilities in whole Java repositories. IRIS addresses the limitations of traditional static analysis tools, such as CodeQL, which often fail to detect vulnerabilities due to incomplete or missing taint specifications. The approach leverages LLMs to infer taint specifications for third-party library APIs, which are then used by static analysis tools to detect vulnerabilities. IRIS also incorporates contextual analysis to reduce false positives and improve the accuracy of vulnerability detection.
The authors curate a new dataset, CWE-Bench-Java, containing 120 manually validated security vulnerabilities in real-world Java projects. These projects are complex, with an average of 300,000 lines of code and some containing up to 7 million lines. IRIS detects 69 of these vulnerabilities using GPT-4, while CodeQL only detects 27. Additionally, IRIS significantly reduces the number of false alarms, with a best-case reduction of over 80%.
The IRIS framework consists of four main stages: (1) extracting candidate APIs, (2) inferring taint specifications using LLMs, (3) performing taint analysis with static analysis tools, and (4) triaging alerts using contextual analysis. The framework is evaluated on CWE-Bench-Java using eight diverse LLMs, with IRIS achieving the best results with GPT-4, detecting 69 vulnerabilities, which is 42 more than CodeQL. The context-based filtering technique also reduces false positives by 80%.
The paper also discusses the challenges of static analysis, including false positives due to imprecise context-sensitive reasoning and the difficulty of manually specifying taint sources and sinks. IRIS addresses these challenges by using LLMs to infer specifications and by incorporating contextual analysis to filter out false positives. The results show that IRIS can detect many vulnerabilities that are beyond the reach of traditional static analysis tools, while keeping false alarms to a minimum. The authors conclude that combining LLMs with static analysis can significantly improve the effectiveness of vulnerability detection.This paper introduces IRIS, a novel neuro-symbolic approach that combines large language models (LLMs) with static analysis to detect security vulnerabilities in whole Java repositories. IRIS addresses the limitations of traditional static analysis tools, such as CodeQL, which often fail to detect vulnerabilities due to incomplete or missing taint specifications. The approach leverages LLMs to infer taint specifications for third-party library APIs, which are then used by static analysis tools to detect vulnerabilities. IRIS also incorporates contextual analysis to reduce false positives and improve the accuracy of vulnerability detection.
The authors curate a new dataset, CWE-Bench-Java, containing 120 manually validated security vulnerabilities in real-world Java projects. These projects are complex, with an average of 300,000 lines of code and some containing up to 7 million lines. IRIS detects 69 of these vulnerabilities using GPT-4, while CodeQL only detects 27. Additionally, IRIS significantly reduces the number of false alarms, with a best-case reduction of over 80%.
The IRIS framework consists of four main stages: (1) extracting candidate APIs, (2) inferring taint specifications using LLMs, (3) performing taint analysis with static analysis tools, and (4) triaging alerts using contextual analysis. The framework is evaluated on CWE-Bench-Java using eight diverse LLMs, with IRIS achieving the best results with GPT-4, detecting 69 vulnerabilities, which is 42 more than CodeQL. The context-based filtering technique also reduces false positives by 80%.
The paper also discusses the challenges of static analysis, including false positives due to imprecise context-sensitive reasoning and the difficulty of manually specifying taint sources and sinks. IRIS addresses these challenges by using LLMs to infer specifications and by incorporating contextual analysis to filter out false positives. The results show that IRIS can detect many vulnerabilities that are beyond the reach of traditional static analysis tools, while keeping false alarms to a minimum. The authors conclude that combining LLMs with static analysis can significantly improve the effectiveness of vulnerability detection.