April 14-20, 2024 | Adriana Sejfia, Satyaki Das, Saad Shafiq, and Nenad Medvidović
This paper investigates the limitations of deep learning (DL)-based vulnerability detectors in identifying multi-base unit (MBU) vulnerabilities, which span multiple base units (e.g., code lines, functions, or program slices). While existing DL-based detectors focus on individual base units (IBUs), many real-world vulnerabilities span multiple base units, making it challenging for these detectors to accurately identify them. The study evaluates three prominent DL-based detectors—ReVeal, DeepWukong, and LineVul—and finds that all three contain MBU vulnerabilities in their datasets. However, their accuracy drops significantly when detecting these types of vulnerabilities. The study also reveals that the detectors do not properly account for MBU vulnerabilities in their training, validation, and testing processes, leading to inaccurate accuracy reports.
The paper defines MBU vulnerabilities and categorizes them, showing that they constitute a significant portion of vulnerabilities in the datasets of these detectors. For example, 22% of ReVeal's vulnerabilities, 53% of DeepWukong's, and 61% and 37% of LineVul's datasets are MBU vulnerabilities. The study further shows that the detectors' accuracy metrics, such as precision, true positive rate (TPR), and Matthews correlation coefficient (MCC), drop when considering complete vulnerabilities rather than individual base units. This indicates that the detectors are not trained to handle MBU vulnerabilities effectively.
To address these issues, the paper proposes an automated framework for including MBU vulnerabilities in DL-based detectors. The framework includes components such as a Patch Collector and a Patch Cleaner, which help in obtaining and cleaning vulnerability patches. The study also demonstrates that using a similarity-based approach to distinguish compound patches improves the accuracy of detecting MBU vulnerabilities. The results show that the similarity-based approach achieves higher precision and recall compared to clustering-based methods.
The findings highlight the need for DL-based vulnerability detectors to account for MBU vulnerabilities in their training and evaluation processes. The proposed framework aims to improve the detection of MBU vulnerabilities and enhance the effectiveness of DL-based vulnerability detection techniques. The study also emphasizes the importance of realistic training and evaluation scenarios for DL-based detectors to ensure their accuracy and reliability in real-world applications.This paper investigates the limitations of deep learning (DL)-based vulnerability detectors in identifying multi-base unit (MBU) vulnerabilities, which span multiple base units (e.g., code lines, functions, or program slices). While existing DL-based detectors focus on individual base units (IBUs), many real-world vulnerabilities span multiple base units, making it challenging for these detectors to accurately identify them. The study evaluates three prominent DL-based detectors—ReVeal, DeepWukong, and LineVul—and finds that all three contain MBU vulnerabilities in their datasets. However, their accuracy drops significantly when detecting these types of vulnerabilities. The study also reveals that the detectors do not properly account for MBU vulnerabilities in their training, validation, and testing processes, leading to inaccurate accuracy reports.
The paper defines MBU vulnerabilities and categorizes them, showing that they constitute a significant portion of vulnerabilities in the datasets of these detectors. For example, 22% of ReVeal's vulnerabilities, 53% of DeepWukong's, and 61% and 37% of LineVul's datasets are MBU vulnerabilities. The study further shows that the detectors' accuracy metrics, such as precision, true positive rate (TPR), and Matthews correlation coefficient (MCC), drop when considering complete vulnerabilities rather than individual base units. This indicates that the detectors are not trained to handle MBU vulnerabilities effectively.
To address these issues, the paper proposes an automated framework for including MBU vulnerabilities in DL-based detectors. The framework includes components such as a Patch Collector and a Patch Cleaner, which help in obtaining and cleaning vulnerability patches. The study also demonstrates that using a similarity-based approach to distinguish compound patches improves the accuracy of detecting MBU vulnerabilities. The results show that the similarity-based approach achieves higher precision and recall compared to clustering-based methods.
The findings highlight the need for DL-based vulnerability detectors to account for MBU vulnerabilities in their training and evaluation processes. The proposed framework aims to improve the detection of MBU vulnerabilities and enhance the effectiveness of DL-based vulnerability detection techniques. The study also emphasizes the importance of realistic training and evaluation scenarios for DL-based detectors to ensure their accuracy and reliability in real-world applications.