[slides] An Extensive Comparison of Static Application Security Testing Tools

This paper presents an extensive evaluation of Static Application Security Testing Tools (SASTTs) to assess their effectiveness in detecting software vulnerabilities. The study aims to set a reliable benchmark for evaluating and improving SASTTs or alternative approaches, such as machine learning-based solutions. The evaluation is based on a controlled, synthetic Java codebase with 1.5 million test executions and features innovative methodological aspects like effort-aware accuracy metrics and method-level analysis. The key findings of the study reveal that SASTTs have a high Precision but fall short in Recall, indicating that false negatives are more common than false positives. Specifically, SASTTs detect a limited range of vulnerabilities, with most CWEs (Common Vulnerabilities and Exposures) not being identified by any of the tested tools. The study also highlights that the accuracy of SASTTs varies significantly, with some tools performing much better than others. The paper provides several recommendations for practitioners and researchers, including the use of multiple SASTTs to identify a broader spectrum of vulnerabilities, complementing SASTTs with other techniques like code inspection, and focusing on improving Recall over Precision when enhancing SASTTs or exploring alternative approaches. The study concludes by emphasizing the need for further research to address the limitations of SASTTs, particularly in reducing false negatives.This paper presents an extensive evaluation of Static Application Security Testing Tools (SASTTs) to assess their effectiveness in detecting software vulnerabilities. The study aims to set a reliable benchmark for evaluating and improving SASTTs or alternative approaches, such as machine learning-based solutions. The evaluation is based on a controlled, synthetic Java codebase with 1.5 million test executions and features innovative methodological aspects like effort-aware accuracy metrics and method-level analysis. The key findings of the study reveal that SASTTs have a high Precision but fall short in Recall, indicating that false negatives are more common than false positives. Specifically, SASTTs detect a limited range of vulnerabilities, with most CWEs (Common Vulnerabilities and Exposures) not being identified by any of the tested tools. The study also highlights that the accuracy of SASTTs varies significantly, with some tools performing much better than others. The paper provides several recommendations for practitioners and researchers, including the use of multiple SASTTs to identify a broader spectrum of vulnerabilities, complementing SASTTs with other techniques like code inspection, and focusing on improving Recall over Precision when enhancing SASTTs or exploring alternative approaches. The study concludes by emphasizing the need for further research to address the limitations of SASTTs, particularly in reducing false negatives.

An Extensive Comparison of Static Application Security Testing Tools

2024 | Matteo Esposito, Valentina Falaschi, Davide Falessi