October 14-18, 2024 | Haochen Sun, Jason Li, Hongyang Zhang
The paper introduces zkLLM, a specialized zero-knowledge proof system designed to verify the outputs of large language models (LLMs) without revealing their internal parameters. This addresses the legal and ethical concerns surrounding the use of LLMs, particularly in applications where their authenticity is crucial. The key contributions of the paper include:
1. **t1lookup**: A parallelized lookup argument for non-arithmetic operations in deep learning, which efficiently handles operations like activation functions without asymptotic overhead.
2. **zkAtttn**: A specialized zero-knowledge proof for the attention mechanism, which balances running time, memory usage, and accuracy while maintaining security and privacy.
3. **Efficient CUDA Implementation**: The implementation of these protocols on CUDA, enabling efficient verification of LLMs with up to 13 billion parameters within 15 minutes and producing compact proofs under 200 kB.
The paper also provides a detailed technical overview, including the design of t1lookup and zkAtttn, and discusses the challenges and solutions for verifying non-arithmetic operations and the attention mechanism. The authors demonstrate the effectiveness of their approach through a comprehensive analysis, showing that the errors introduced by zkAttn are within acceptable bounds.The paper introduces zkLLM, a specialized zero-knowledge proof system designed to verify the outputs of large language models (LLMs) without revealing their internal parameters. This addresses the legal and ethical concerns surrounding the use of LLMs, particularly in applications where their authenticity is crucial. The key contributions of the paper include:
1. **t1lookup**: A parallelized lookup argument for non-arithmetic operations in deep learning, which efficiently handles operations like activation functions without asymptotic overhead.
2. **zkAtttn**: A specialized zero-knowledge proof for the attention mechanism, which balances running time, memory usage, and accuracy while maintaining security and privacy.
3. **Efficient CUDA Implementation**: The implementation of these protocols on CUDA, enabling efficient verification of LLMs with up to 13 billion parameters within 15 minutes and producing compact proofs under 200 kB.
The paper also provides a detailed technical overview, including the design of t1lookup and zkAtttn, and discusses the challenges and solutions for verifying non-arithmetic operations and the attention mechanism. The authors demonstrate the effectiveness of their approach through a comprehensive analysis, showing that the errors introduced by zkAttn are within acceptable bounds.