2024 | Richard Fang, Rohan Bindu, Akul Gupta, Qiushi Zhan, Daniel Kang
Large language models (LLMs) can autonomously hack websites, performing complex tasks like blind database schema extraction and SQL injection without human feedback. This capability is enabled by advanced models like GPT-4, which can identify and exploit vulnerabilities in real-world websites. The study shows that GPT-4 can autonomously find vulnerabilities in websites, with a success rate of 73.3% on tested vulnerabilities. In contrast, open-source models fail to achieve this. The research highlights the need for careful deployment of LLMs due to their potential for misuse. The study also demonstrates that LLM agents can perform complex tasks requiring multiple steps and feedback from the website. The cost of autonomous hacking with GPT-4 is approximately $9.81 per website, which is significantly lower than the cost of human effort. The findings raise concerns about the security implications of widely deploying LLMs. The study emphasizes the importance of responsible disclosure and the need for further research into the capabilities and risks of LLMs.Large language models (LLMs) can autonomously hack websites, performing complex tasks like blind database schema extraction and SQL injection without human feedback. This capability is enabled by advanced models like GPT-4, which can identify and exploit vulnerabilities in real-world websites. The study shows that GPT-4 can autonomously find vulnerabilities in websites, with a success rate of 73.3% on tested vulnerabilities. In contrast, open-source models fail to achieve this. The research highlights the need for careful deployment of LLMs due to their potential for misuse. The study also demonstrates that LLM agents can perform complex tasks requiring multiple steps and feedback from the website. The cost of autonomous hacking with GPT-4 is approximately $9.81 per website, which is significantly lower than the cost of human effort. The findings raise concerns about the security implications of widely deploying LLMs. The study emphasizes the importance of responsible disclosure and the need for further research into the capabilities and risks of LLMs.