20 Jan 2024 | Chaofan Shou, Jing Liu, Doudou Lu, Koushik Sen
**LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models**
**Authors:** Chaofan Shou
**Abstract:**
As blockchain platforms grow exponentially, millions of lines of smart contract code are being deployed to manage extensive digital assets. However, vulnerabilities in this critical code have led to significant exploitations and asset losses. Thorough automated security analysis of smart contracts is imperative. This paper introduces LLM4FUZZ, a novel methodology that leverages large language models (LLMs) to optimize automated smart contract security analysis by intelligently guiding and prioritizing fuzzing campaigns. Traditional fuzzing suffers from low efficiency in exploring the vast state space, but LLM4FUZZ employs LLMs to direct fuzzers towards high-value code regions and input sequences more likely to trigger vulnerabilities. Additionally, LLM4FUZZ can leverage LLMs to guide fuzzers based on user-defined invariants, reducing blind exploration overhead. Evaluations on real-world DeFi projects show substantial gains in efficiency, coverage, and vulnerability detection compared to baseline fuzzing. LLM4FUZZ also uncovered five critical vulnerabilities that can lead to a loss of more than $247k$.
**Introduction:**
The exponential growth of decentralized applications and blockchain platforms has led to the deployment of millions of lines of smart contract code to manage billions of dollars in digital assets. Traditional manual auditing is error-prone and often overlooks corner-case flaws. Automated methods like testing, dynamic analysis, and formal verification are increasingly used to overcome these limitations. Guided fuzz testing stands as one of the most prevalent and reliable techniques. However, traditional fuzzing for smart contracts faces challenges due to the stateful nature of smart contracts and the difficulty in accurately tracking dataflow information. Recent developments in LLMs have shown their potential in finding vulnerabilities and developing potential exploits, but direct integration into fuzzing has been limited due to high false positives and negatives.
**Background:**
Smart contracts are programs deployed on blockchain networks that execute autonomous digital agreements. They manage complex business logic and handle extensive financial assets without centralized intermediaries. Feedback-driven fuzzing incorporates real-time feedback from program execution to guide the generation of subsequent test cases, making the process more targeted and efficient. LLMs, characterized by their vast number of parameters and ability to capture intricate patterns in language, have shown promise in boosting traditional software fuzzing. However, their integration into smart contract security analysis remains unexplored.
**Motivating Example:**
The AES smart contract project is used as a motivating example to illustrate the challenges of traditional fuzzing. The project was exploited, leading to the loss of $62k$ worth of assets. Traditional fuzzers struggled to identify and prioritize the most vulnerable functions, often wasting effort on well-tested and non-vulnerable functions.
**Methodology:**
LLM4FUZZ uses LLMs to guide and prioritize the fuzzing of smart contracts. The workflow involves converting each smart contract**LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models**
**Authors:** Chaofan Shou
**Abstract:**
As blockchain platforms grow exponentially, millions of lines of smart contract code are being deployed to manage extensive digital assets. However, vulnerabilities in this critical code have led to significant exploitations and asset losses. Thorough automated security analysis of smart contracts is imperative. This paper introduces LLM4FUZZ, a novel methodology that leverages large language models (LLMs) to optimize automated smart contract security analysis by intelligently guiding and prioritizing fuzzing campaigns. Traditional fuzzing suffers from low efficiency in exploring the vast state space, but LLM4FUZZ employs LLMs to direct fuzzers towards high-value code regions and input sequences more likely to trigger vulnerabilities. Additionally, LLM4FUZZ can leverage LLMs to guide fuzzers based on user-defined invariants, reducing blind exploration overhead. Evaluations on real-world DeFi projects show substantial gains in efficiency, coverage, and vulnerability detection compared to baseline fuzzing. LLM4FUZZ also uncovered five critical vulnerabilities that can lead to a loss of more than $247k$.
**Introduction:**
The exponential growth of decentralized applications and blockchain platforms has led to the deployment of millions of lines of smart contract code to manage billions of dollars in digital assets. Traditional manual auditing is error-prone and often overlooks corner-case flaws. Automated methods like testing, dynamic analysis, and formal verification are increasingly used to overcome these limitations. Guided fuzz testing stands as one of the most prevalent and reliable techniques. However, traditional fuzzing for smart contracts faces challenges due to the stateful nature of smart contracts and the difficulty in accurately tracking dataflow information. Recent developments in LLMs have shown their potential in finding vulnerabilities and developing potential exploits, but direct integration into fuzzing has been limited due to high false positives and negatives.
**Background:**
Smart contracts are programs deployed on blockchain networks that execute autonomous digital agreements. They manage complex business logic and handle extensive financial assets without centralized intermediaries. Feedback-driven fuzzing incorporates real-time feedback from program execution to guide the generation of subsequent test cases, making the process more targeted and efficient. LLMs, characterized by their vast number of parameters and ability to capture intricate patterns in language, have shown promise in boosting traditional software fuzzing. However, their integration into smart contract security analysis remains unexplored.
**Motivating Example:**
The AES smart contract project is used as a motivating example to illustrate the challenges of traditional fuzzing. The project was exploited, leading to the loss of $62k$ worth of assets. Traditional fuzzers struggled to identify and prioritize the most vulnerable functions, often wasting effort on well-tested and non-vulnerable functions.
**Methodology:**
LLM4FUZZ uses LLMs to guide and prioritize the fuzzing of smart contracts. The workflow involves converting each smart contract