[slides and audio] Privacy-Preserving Instructions for Aligning Large Language Models

The paper addresses the privacy risks associated with the collection and annotation of user instructions for large language models (LLMs). These instructions, which may contain sensitive personal information, are often annotated by human workers, leading to potential privacy breaches. To mitigate this, the authors propose using synthetic instructions generated from privately fine-tuned generators, ensuring formal differential privacy (DP) through a novel filtering algorithm that matches the distribution of synthetic instructions to that of real instructions. The proposed framework is evaluated in both supervised fine-tuning and reinforcement learning from human feedback, demonstrating comparable performance to real instructions while maintaining high utility. The results show that models trained with private synthetic instructions outperform leading open-source models such as Vicuna, highlighting the effectiveness of the proposed approach in preserving user privacy while achieving high model performance.The paper addresses the privacy risks associated with the collection and annotation of user instructions for large language models (LLMs). These instructions, which may contain sensitive personal information, are often annotated by human workers, leading to potential privacy breaches. To mitigate this, the authors propose using synthetic instructions generated from privately fine-tuned generators, ensuring formal differential privacy (DP) through a novel filtering algorithm that matches the distribution of synthetic instructions to that of real instructions. The proposed framework is evaluated in both supervised fine-tuning and reinforcement learning from human feedback, demonstrating comparable performance to real instructions while maintaining high utility. The results show that models trained with private synthetic instructions outperform leading open-source models such as Vicuna, highlighting the effectiveness of the proposed approach in preserving user privacy while achieving high model performance.

Privacy-Preserving Instructions for Aligning Large Language Models

2024 | Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu