[slides and audio] Toward Improved Deep Learning-Based Vulnerability Detection

This paper explores the limitations of deep learning (DL) techniques in detecting vulnerabilities that span multiple base units (MBU vulnerabilities). The authors hypothesize that existing DL-based detectors, which focus on individual base units (IBUs), may struggle with MBU vulnerabilities, leading to reduced accuracy. They evaluate three prominent DL-based detectors—ReVeal, DeepWukong, and LineVul—using a systematic analysis of their datasets and training processes. Key findings include: 1. **Presence of MBU Vulnerabilities**: MBU vulnerabilities are significant in the datasets, comprising 22% to 61% of all vulnerabilities across the three detectors. 2. **Usage in Training and Evaluation**: The detectors fail to properly account for MBU vulnerabilities in their training, validation, and testing sets, leading to inaccurate accuracy reports. 3. **Accuracy on MBU Vulnerabilities**: When evaluated on complete vulnerabilities, the detectors show significant drops in accuracy, particularly in precision and Matthews correlation coefficient (MCC). 4. **Impact of Realistic Training**: Retraining the detectors with a focus on complete vulnerabilities improves performance, especially for ReVeal, while LineVul's precision decreases. The authors propose an automated framework to help DL-based detectors better handle MBU vulnerabilities, including a Patch Collector and a Patch Cleaner component to identify and clean compound patches. The framework aims to enhance the effectiveness of these detectors in real-world scenarios.This paper explores the limitations of deep learning (DL) techniques in detecting vulnerabilities that span multiple base units (MBU vulnerabilities). The authors hypothesize that existing DL-based detectors, which focus on individual base units (IBUs), may struggle with MBU vulnerabilities, leading to reduced accuracy. They evaluate three prominent DL-based detectors—ReVeal, DeepWukong, and LineVul—using a systematic analysis of their datasets and training processes. Key findings include: 1. **Presence of MBU Vulnerabilities**: MBU vulnerabilities are significant in the datasets, comprising 22% to 61% of all vulnerabilities across the three detectors. 2. **Usage in Training and Evaluation**: The detectors fail to properly account for MBU vulnerabilities in their training, validation, and testing sets, leading to inaccurate accuracy reports. 3. **Accuracy on MBU Vulnerabilities**: When evaluated on complete vulnerabilities, the detectors show significant drops in accuracy, particularly in precision and Matthews correlation coefficient (MCC). 4. **Impact of Realistic Training**: Retraining the detectors with a focus on complete vulnerabilities improves performance, especially for ReVeal, while LineVul's precision decreases. The authors propose an automated framework to help DL-based detectors better handle MBU vulnerabilities, including a Patch Collector and a Patch Cleaner component to identify and clean compound patches. The framework aims to enhance the effectiveness of these detectors in real-world scenarios.

Toward Improved Deep Learning-based Vulnerability Detection

April 14–20, 2024, Lisbon, Portugal | Adriana Sejfia, Satyaki Das, Saad Shafiq, Nenad Medvidović