[slides and audio] Differentially Private Knowledge Distillation via Synthetic Text Generation

The paper introduces DistiDP, a novel differentially private knowledge distillation algorithm that leverages synthetic data generated by a differentially private teacher LLM. The key contributions of DistiDP include: 1. **Novel Approach**: DistiDP avoids the need for additional DP-SGD during knowledge distillation by using DP synthetic data, which is generated using control codes. 2. **Utility Improvement**: Experimental results show that DistiDP significantly improves the utility of the student model over existing baselines, achieving at least 9.0 PPL on the Big Patent dataset with strong privacy parameters ($\epsilon = 2$). 3. **Knowledge Transfer**: The algorithm transfers knowledge from the synthetic data (hard labels) and the output distribution of the teacher (soft labels) to the student. 4. **Hidden Representation Alignment**: If the teacher and student share a similar architectural structure, aligning their hidden representations further enhances the performance. The paper also discusses related works, including knowledge distillation and DP compression, and provides a detailed methodology, including the problem formulation, knowledge distillation, text generation, and differential privacy. The experimental setup and results are presented, demonstrating the effectiveness of DistiDP in achieving better utility while maintaining strong privacy guarantees.The paper introduces DistiDP, a novel differentially private knowledge distillation algorithm that leverages synthetic data generated by a differentially private teacher LLM. The key contributions of DistiDP include: 1. **Novel Approach**: DistiDP avoids the need for additional DP-SGD during knowledge distillation by using DP synthetic data, which is generated using control codes. 2. **Utility Improvement**: Experimental results show that DistiDP significantly improves the utility of the student model over existing baselines, achieving at least 9.0 PPL on the Big Patent dataset with strong privacy parameters ($\epsilon = 2$). 3. **Knowledge Transfer**: The algorithm transfers knowledge from the synthetic data (hard labels) and the output distribution of the teacher (soft labels) to the student. 4. **Hidden Representation Alignment**: If the teacher and student share a similar architectural structure, aligning their hidden representations further enhances the performance. The paper also discusses related works, including knowledge distillation and DP compression, and provides a detailed methodology, including the problem formulation, knowledge distillation, text generation, and differential privacy. The experimental setup and results are presented, demonstrating the effectiveness of DistiDP in achieving better utility while maintaining strong privacy guarantees.

Differentially Private Knowledge Distillation via Synthetic Text Generation

5 Jun 2024 | James Flemings, Murali Annavaram