27 Feb 2024 | Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, Haipeng Cai
This paper explores how to leverage large language models (LLMs) and chain-of-thought (CoT) prompting to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. The authors propose a unified, vulnerability-semantics-guided prompting approach called VSP, which maps vulnerability semantics to chains of thoughts. Through extensive experiments, they demonstrate that VSP outperforms five baselines in terms of F1 accuracy for vulnerability identification, discovery, and patching. For example, on the CVE dataset, VSP achieves 58.48% F1 for vulnerability identification, 45.25% F1 for vulnerability discovery, and 20.00% F1 for vulnerability patching, significantly higher than the baselines. The authors also conduct in-depth case studies to analyze VSP failures, revealing current gaps in LLM/CoT for challenging vulnerability cases and proposing improvements. Their results suggest that VSP is a promising direction for software vulnerability analysis, with the potential to push the effectiveness of vulnerability analysis to a new height. The study also contributes to future ways of improving vulnerability analysis using LLMs. All code and datasets are available at Figshare.This paper explores how to leverage large language models (LLMs) and chain-of-thought (CoT) prompting to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. The authors propose a unified, vulnerability-semantics-guided prompting approach called VSP, which maps vulnerability semantics to chains of thoughts. Through extensive experiments, they demonstrate that VSP outperforms five baselines in terms of F1 accuracy for vulnerability identification, discovery, and patching. For example, on the CVE dataset, VSP achieves 58.48% F1 for vulnerability identification, 45.25% F1 for vulnerability discovery, and 20.00% F1 for vulnerability patching, significantly higher than the baselines. The authors also conduct in-depth case studies to analyze VSP failures, revealing current gaps in LLM/CoT for challenging vulnerability cases and proposing improvements. Their results suggest that VSP is a promising direction for software vulnerability analysis, with the potential to push the effectiveness of vulnerability analysis to a new height. The study also contributes to future ways of improving vulnerability analysis using LLMs. All code and datasets are available at Figshare.