[slides] Private Fine-tuning of Large Language Models with Zeroth-order Optimization

This paper introduces DP-ZO (Differentially Private Zeroth-Order Optimization), a novel framework for private fine-tuning of large language models. DP-ZO privatizes zeroth-order optimization methods by adding noise to the scalar step size, which is more memory-efficient compared to DP-SGD (Differentially Private Stochastic Gradient Descent). The key insight is that in zeroth-order optimization, the only information from training data is a scalar step size, and privatizing this scalar is sufficient to ensure differential privacy. DP-ZO provides a strong privacy-utility trade-off across various tasks, model sizes, and dataset sizes, comparable to DP-SGD in terms of performance. Notably, DP-ZO offers significant advantages in memory efficiency and achieves higher utility in ε-DP when using the Laplace mechanism. The paper also includes empirical privacy analysis, showing that DP-ZO reduces privacy leakage compared to DP-SGD. DP-ZO is shown to scale well to large models, such as 30B/66B parameter models, and is more resource-efficient and easier to implement than DP-SGD. The authors conclude that DP-ZO is a compelling alternative to DP-SGD for private fine-tuning of large language models.This paper introduces DP-ZO (Differentially Private Zeroth-Order Optimization), a novel framework for private fine-tuning of large language models. DP-ZO privatizes zeroth-order optimization methods by adding noise to the scalar step size, which is more memory-efficient compared to DP-SGD (Differentially Private Stochastic Gradient Descent). The key insight is that in zeroth-order optimization, the only information from training data is a scalar step size, and privatizing this scalar is sufficient to ensure differential privacy. DP-ZO provides a strong privacy-utility trade-off across various tasks, model sizes, and dataset sizes, comparable to DP-SGD in terms of performance. Notably, DP-ZO offers significant advantages in memory efficiency and achieves higher utility in ε-DP when using the Laplace mechanism. The paper also includes empirical privacy analysis, showing that DP-ZO reduces privacy leakage compared to DP-SGD. DP-ZO is shown to scale well to large models, such as 30B/66B parameter models, and is more resource-efficient and easier to implement than DP-SGD. The authors conclude that DP-ZO is a compelling alternative to DP-SGD for private fine-tuning of large language models.

Private Fine-tuning of Large Language Models with Zeroth-order Optimization

12 Aug 2024 | Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal