Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

2024 | Lynn Chua, Badih Ghazi, Yangsibo Huang, Prithish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
This paper introduces user-level differential privacy (DP) for language model (LM) fine-tuning, addressing the issue of unequal privacy protection when users contribute varying numbers of records. While existing studies often treat each record as the privacy unit, user-level DP ensures uniform privacy protection across all users. The authors evaluate two mechanisms for achieving user-level DP: Group Privacy and User-wise DP-SGD. Group Privacy limits each user's contribution to a fixed number of records, while User-wise DP-SGD samples users and records in a way that allows for more diverse data selection. The study shows that User-wise DP-SGD generally outperforms Group Privacy, especially for smaller privacy budgets. The authors also analyze the impact of data selection strategies and the number of records per user on privacy-utility trade-offs. They find that selecting longer sequences or random chunks can improve performance, while larger k values are beneficial for larger privacy budgets. The study highlights the importance of considering user-level privacy in language model fine-tuning and provides empirical insights into the effectiveness of different DP mechanisms.This paper introduces user-level differential privacy (DP) for language model (LM) fine-tuning, addressing the issue of unequal privacy protection when users contribute varying numbers of records. While existing studies often treat each record as the privacy unit, user-level DP ensures uniform privacy protection across all users. The authors evaluate two mechanisms for achieving user-level DP: Group Privacy and User-wise DP-SGD. Group Privacy limits each user's contribution to a fixed number of records, while User-wise DP-SGD samples users and records in a way that allows for more diverse data selection. The study shows that User-wise DP-SGD generally outperforms Group Privacy, especially for smaller privacy budgets. The authors also analyze the impact of data selection strategies and the number of records per user on privacy-utility trade-offs. They find that selecting longer sequences or random chunks can improve performance, while larger k values are beneficial for larger privacy budgets. The study highlights the importance of considering user-level privacy in language model fine-tuning and provides empirical insights into the effectiveness of different DP mechanisms.
Reach us at info@study.space
[slides and audio] Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning