16 May 2024 | Moritz Schlegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias Scharnowski, Addison Crump, Arash Ale-Ebrahimi, Nicolai Bissantz, Marius Muench, Thorsten Holz
This paper systematically analyzes the evaluation practices of 150 fuzzing papers published between 2018 and 2023, focusing on top venues in computer security and software engineering. The authors identify several shortcomings in the reproducibility and validity of these evaluations, particularly regarding statistical tests and systematic errors. They find that many papers do not follow established guidelines, such as using sufficient trial runs and appropriate statistical tests. Additionally, the paper examines eight case studies to assess the practical reproducibility of fuzzing research, revealing further issues with evaluation setups. Based on these findings, the authors propose updated guidelines for conducting reproducible and scientifically valid fuzzing evaluations, emphasizing the importance of fair resource allocation, clear documentation, and robust statistical methods. The paper aims to enhance the reliability and credibility of future fuzzing research by addressing common pitfalls and providing revised best practices.This paper systematically analyzes the evaluation practices of 150 fuzzing papers published between 2018 and 2023, focusing on top venues in computer security and software engineering. The authors identify several shortcomings in the reproducibility and validity of these evaluations, particularly regarding statistical tests and systematic errors. They find that many papers do not follow established guidelines, such as using sufficient trial runs and appropriate statistical tests. Additionally, the paper examines eight case studies to assess the practical reproducibility of fuzzing research, revealing further issues with evaluation setups. Based on these findings, the authors propose updated guidelines for conducting reproducible and scientifically valid fuzzing evaluations, emphasizing the importance of fair resource allocation, clear documentation, and robust statistical methods. The paper aims to enhance the reliability and credibility of future fuzzing research by addressing common pitfalls and providing revised best practices.