24 Jun 2024 | Gyeongman Kim, Doohyuk Jang, Eunho Yang
**PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning**
**Authors:** Gyeongman Kim, Doohyuk Jang, Eunho Yang
**Institution:** Korea Advanced Institute of Science and Technology (KAIST), AITRICS
**Abstract:**
Recent advancements in large language models (LLMs) have raised concerns about inference costs, leading to a growing need for model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models is limited. PromptKD, a novel method, leverages prompt tuning to enable generative language models to transfer student-friendly knowledge. Unlike previous KD methods that require fine-tuning the entire teacher model, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Analysis suggests that distilling student-friendly knowledge effectively alleviates exposure bias throughout the training process, leading to performance enhancements.
**Key Contributions:**
1. **Investigation of Student-Friendly Knowledge:** PromptKD explores the effect of student-friendly knowledge in KD for generation tasks.
2. **First Use of Prompt Tuning in KD:** It is the first to use prompt tuning in KD, enabling memory-efficient extraction of student-friendly knowledge.
3. **State-of-the-Art Performance:** PromptKD achieves state-of-the-art performance on instruction-following datasets.
4. **Exposure Bias Mitigation:** It demonstrates superior performance in mitigating exposure bias during training.
**Related Work:**
- **KD for Text Classification:** Most KD research focuses on text classification tasks, with methods evolving from simple approaches to more complex ones.
- **KD for Text Generation:** Methods like Supervised KD and SeqKD aim to minimize distribution discrepancies, but they are not designed for generative models.
- **Prompt Tuning:** Prompt tuning has become a prominent parameter-efficient fine-tuning technique, but it has not been used in KD for generative models.
**PromptKD Method:**
- **Instruction-Following Setting:** PromptKD formulates instruction-following as a conditional text generation task.
- **Pseudo-Target Generation:** Responses generated by the student are used as pseudo-targets to address exposure bias.
- **Prompt Tuning for Adaptive Teaching:** The prompt is updated to minimize the KD loss, encouraging the teacher to generate sentences similar to the student.
- **Student-Friendly Knowledge Distillation:** The updated prompt distills student-friendly knowledge to the student, minimizing distribution discrepancies.
**Experiments:**
- **Dataset and Models:** PromptKD is evaluated on 5 instruction-following datasets using various models, including GPT-2, OPT, and Llama.
- **Baselines:** Compared with supervised fine-tuning and other KD methods, Prompt**PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning**
**Authors:** Gyeongman Kim, Doohyuk Jang, Eunho Yang
**Institution:** Korea Advanced Institute of Science and Technology (KAIST), AITRICS
**Abstract:**
Recent advancements in large language models (LLMs) have raised concerns about inference costs, leading to a growing need for model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models is limited. PromptKD, a novel method, leverages prompt tuning to enable generative language models to transfer student-friendly knowledge. Unlike previous KD methods that require fine-tuning the entire teacher model, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Analysis suggests that distilling student-friendly knowledge effectively alleviates exposure bias throughout the training process, leading to performance enhancements.
**Key Contributions:**
1. **Investigation of Student-Friendly Knowledge:** PromptKD explores the effect of student-friendly knowledge in KD for generation tasks.
2. **First Use of Prompt Tuning in KD:** It is the first to use prompt tuning in KD, enabling memory-efficient extraction of student-friendly knowledge.
3. **State-of-the-Art Performance:** PromptKD achieves state-of-the-art performance on instruction-following datasets.
4. **Exposure Bias Mitigation:** It demonstrates superior performance in mitigating exposure bias during training.
**Related Work:**
- **KD for Text Classification:** Most KD research focuses on text classification tasks, with methods evolving from simple approaches to more complex ones.
- **KD for Text Generation:** Methods like Supervised KD and SeqKD aim to minimize distribution discrepancies, but they are not designed for generative models.
- **Prompt Tuning:** Prompt tuning has become a prominent parameter-efficient fine-tuning technique, but it has not been used in KD for generative models.
**PromptKD Method:**
- **Instruction-Following Setting:** PromptKD formulates instruction-following as a conditional text generation task.
- **Pseudo-Target Generation:** Responses generated by the student are used as pseudo-targets to address exposure bias.
- **Prompt Tuning for Adaptive Teaching:** The prompt is updated to minimize the KD loss, encouraging the teacher to generate sentences similar to the student.
- **Student-Friendly Knowledge Distillation:** The updated prompt distills student-friendly knowledge to the student, minimizing distribution discrepancies.
**Experiments:**
- **Dataset and Models:** PromptKD is evaluated on 5 instruction-following datasets using various models, including GPT-2, OPT, and Llama.
- **Baselines:** Compared with supervised fine-tuning and other KD methods, Prompt