Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

2024 | Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin Inan, Harsha Nori, Haodian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin
This paper introduces AUG-PE, an augmented version of the Private Evolution (PE) algorithm, designed to generate differentially private (DP) synthetic text data using only API access to large language models (LLMs). The authors address the limitations of existing methods, which require DP fine-tuning of LLMs on private data, making them infeasible for proprietary LLMs and computationally intensive for open-source LLMs. AUG-PE leverages the instruction-following capabilities of LLMs to generate and refine synthetic text samples, ensuring privacy through the addition of Gaussian noise to voting probabilities. The method is evaluated on three benchmark datasets (Yelp, OpenReview, and PubMed) and shows competitive or superior performance compared to state-of-the-art DP fine-tuning baselines, especially when using more powerful LLMs like GPT-3.5. AUG-PE also demonstrates improved efficiency and robustness to empirical privacy attacks, making it a promising solution for generating high-quality DP synthetic text data. The code and data are available at https://github.com/Al-secure/aug-pe.This paper introduces AUG-PE, an augmented version of the Private Evolution (PE) algorithm, designed to generate differentially private (DP) synthetic text data using only API access to large language models (LLMs). The authors address the limitations of existing methods, which require DP fine-tuning of LLMs on private data, making them infeasible for proprietary LLMs and computationally intensive for open-source LLMs. AUG-PE leverages the instruction-following capabilities of LLMs to generate and refine synthetic text samples, ensuring privacy through the addition of Gaussian noise to voting probabilities. The method is evaluated on three benchmark datasets (Yelp, OpenReview, and PubMed) and shows competitive or superior performance compared to state-of-the-art DP fine-tuning baselines, especially when using more powerful LLMs like GPT-3.5. AUG-PE also demonstrates improved efficiency and robustness to empirical privacy attacks, making it a promising solution for generating high-quality DP synthetic text data. The code and data are available at https://github.com/Al-secure/aug-pe.
Reach us at info@study.space
[slides and audio] Differentially Private Synthetic Data via Foundation Model APIs 2%3A Text