6 Mar 2024 | Asmita1, Yaroslav Oliinyk2, Michael Scott2, Ryan Tsang1, Chongzhou Fang1, Houman Homayoun1
This paper explores the vulnerabilities in BusyBox, a widely used open-source software that bundles over 300 essential Linux commands into a single executable. The study highlights the prevalence of older BusyBox versions in real-world embedded devices and introduces two techniques to enhance software testing: leveraging Large Language Models (LLMs) for initial seed generation and crash reuse. The first technique uses LLMs to generate target-specific initial seeds, significantly increasing the number of crashes identified. The second technique involves repurposing previously acquired crash data from similar targets before fuzzing a new target, streamlining the process and saving time. The authors demonstrate these techniques using AFL++ on BusyBox, identifying crashes in the latest version without traditional fuzzing. The paper also includes a manual crash triaging process to determine the nature of the crashes and their impact on the system. The results show that LLM-generated initial seeds lead to more crashes and edges discovered, while crash reuse effectively identifies vulnerabilities in new targets without extensive fuzzing. The study underscores the importance of updating BusyBox to address known vulnerabilities and the potential of LLMs and crash reuse in improving software testing and vulnerability detection in embedded systems.This paper explores the vulnerabilities in BusyBox, a widely used open-source software that bundles over 300 essential Linux commands into a single executable. The study highlights the prevalence of older BusyBox versions in real-world embedded devices and introduces two techniques to enhance software testing: leveraging Large Language Models (LLMs) for initial seed generation and crash reuse. The first technique uses LLMs to generate target-specific initial seeds, significantly increasing the number of crashes identified. The second technique involves repurposing previously acquired crash data from similar targets before fuzzing a new target, streamlining the process and saving time. The authors demonstrate these techniques using AFL++ on BusyBox, identifying crashes in the latest version without traditional fuzzing. The paper also includes a manual crash triaging process to determine the nature of the crashes and their impact on the system. The results show that LLM-generated initial seeds lead to more crashes and edges discovered, while crash reuse effectively identifies vulnerabilities in new targets without extensive fuzzing. The study underscores the importance of updating BusyBox to address known vulnerabilities and the potential of LLMs and crash reuse in improving software testing and vulnerability detection in embedded systems.