17 Apr 2024 | Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang
LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. This study demonstrates that GPT-4 can exploit 87% of 15 one-day vulnerabilities, while other models and open-source vulnerability scanners fail to exploit any. The vulnerabilities were sourced from the CVE database and academic papers, including real-world websites, container management software, and vulnerable Python packages. The study shows that GPT-4 requires the CVE description for high performance, as without it, it can only exploit 7% of the vulnerabilities. The results highlight the potential of LLM agents in cybersecurity and raise concerns about their widespread deployment. The study also evaluates the cost of using GPT-4 for exploiting vulnerabilities, finding it significantly cheaper than human labor. The findings suggest that enhancing planning and exploration capabilities of LLM agents could improve their success rate in exploiting vulnerabilities. The study also shows that GPT-4 can autonomously exploit non-web vulnerabilities, such as those in Python packages and container management software. The results indicate that LLM agents have the potential to be highly effective in cybersecurity, but further research is needed to understand their capabilities and ensure their safe deployment.LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. This study demonstrates that GPT-4 can exploit 87% of 15 one-day vulnerabilities, while other models and open-source vulnerability scanners fail to exploit any. The vulnerabilities were sourced from the CVE database and academic papers, including real-world websites, container management software, and vulnerable Python packages. The study shows that GPT-4 requires the CVE description for high performance, as without it, it can only exploit 7% of the vulnerabilities. The results highlight the potential of LLM agents in cybersecurity and raise concerns about their widespread deployment. The study also evaluates the cost of using GPT-4 for exploiting vulnerabilities, finding it significantly cheaper than human labor. The findings suggest that enhancing planning and exploration capabilities of LLM agents could improve their success rate in exploiting vulnerabilities. The study also shows that GPT-4 can autonomously exploit non-web vulnerabilities, such as those in Python packages and container management software. The results indicate that LLM agents have the potential to be highly effective in cybersecurity, but further research is needed to understand their capabilities and ensure their safe deployment.