Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

6 Jul 2024 | Xin Yin, Chao Ni*, and Shaohua Wang
This paper presents a comprehensive evaluation of Large Language Models (LLMs) on software vulnerability tasks using the Big-Vul dataset. The evaluation covers four main aspects: vulnerability detection, assessment, location, and description. The study finds that while LLMs generally perform well, they still lag behind state-of-the-art approaches and pre-trained Language Models (LMs) in some areas. Specifically, LLMs excel in vulnerability assessment and location, with certain models like CodeLlama and WizardCoder showing superior performance. However, their tendency to produce excessive output weakens their performance in vulnerability description. The paper also highlights the importance of fine-tuning LLMs for specific tasks and the impact of model size on performance. Overall, the evaluation provides valuable insights into the capabilities and limitations of LLMs in handling software vulnerabilities, emphasizing the need for further improvements in understanding subtle code vulnerabilities and enhancing vulnerability description capabilities.This paper presents a comprehensive evaluation of Large Language Models (LLMs) on software vulnerability tasks using the Big-Vul dataset. The evaluation covers four main aspects: vulnerability detection, assessment, location, and description. The study finds that while LLMs generally perform well, they still lag behind state-of-the-art approaches and pre-trained Language Models (LMs) in some areas. Specifically, LLMs excel in vulnerability assessment and location, with certain models like CodeLlama and WizardCoder showing superior performance. However, their tendency to produce excessive output weakens their performance in vulnerability description. The paper also highlights the importance of fine-tuning LLMs for specific tasks and the impact of model size on performance. Overall, the evaluation provides valuable insights into the capabilities and limitations of LLMs in handling software vulnerabilities, emphasizing the need for further improvements in understanding subtle code vulnerabilities and enhancing vulnerability description capabilities.
Reach us at info@study.space