[slides] LLM Agents can Autonomously Exploit One-day Vulnerabilities

This paper explores the ability of large language models (LLMs) to autonomously exploit one-day vulnerabilities in real-world systems. The authors collected a dataset of 15 one-day vulnerabilities, including critical severity vulnerabilities, and tested them using various models, including GPT-4, GPT-3.5, open-source LLMs, and open-source vulnerability scanners. GPT-4 achieved an 87% success rate in exploiting these vulnerabilities, while other models and scanners had a 0% success rate. The study highlights the potential for LLM agents to be used in cybersecurity, but also raises concerns about their widespread deployment. The authors emphasize the need for careful consideration of how to integrate LLM agents into defensive measures and the ethical implications of their capabilities.This paper explores the ability of large language models (LLMs) to autonomously exploit one-day vulnerabilities in real-world systems. The authors collected a dataset of 15 one-day vulnerabilities, including critical severity vulnerabilities, and tested them using various models, including GPT-4, GPT-3.5, open-source LLMs, and open-source vulnerability scanners. GPT-4 achieved an 87% success rate in exploiting these vulnerabilities, while other models and scanners had a 0% success rate. The study highlights the potential for LLM agents to be used in cybersecurity, but also raises concerns about their widespread deployment. The authors emphasize the need for careful consideration of how to integrate LLM agents into defensive measures and the ethical implications of their capabilities.

LLM Agents can Autonomous ly Exploit One-day Vulnerabilities

17 Apr 2024 | Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang