PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

24 Jun 2024 | Gyeongman Kim, Doohyuk Jang, Eunho Yang
PromptKD is a novel method for distilling student-friendly knowledge into generative language models (LLMs) through prompt tuning. Unlike traditional knowledge distillation (KD) methods that require fine-tuning the entire teacher model, PromptKD adds a small number of prompt tokens and tunes only the prompt with student guidance, significantly reducing computational costs. This approach enables generative models to transfer student-friendly knowledge more efficiently. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Analysis suggests that distilling student-friendly knowledge effectively alleviates exposure bias, leading to performance improvements. PromptKD is designed for instruction-following tasks and uses prompt tuning to extract and distill knowledge from the teacher model. It incorporates regularization loss during training to ensure stable learning and uses student responses as pseudo-targets to mitigate exposure bias. PromptKD outperforms other KD methods, including supervised KD and SeqKD, and demonstrates superior performance across various model sizes and tasks. The method is efficient, memory-friendly, and effective in reducing exposure bias during training. PromptKD's success highlights the importance of student-friendly knowledge in generative language models and provides a new approach for model compression and knowledge distillation.PromptKD is a novel method for distilling student-friendly knowledge into generative language models (LLMs) through prompt tuning. Unlike traditional knowledge distillation (KD) methods that require fine-tuning the entire teacher model, PromptKD adds a small number of prompt tokens and tunes only the prompt with student guidance, significantly reducing computational costs. This approach enables generative models to transfer student-friendly knowledge more efficiently. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Analysis suggests that distilling student-friendly knowledge effectively alleviates exposure bias, leading to performance improvements. PromptKD is designed for instruction-following tasks and uses prompt tuning to extract and distill knowledge from the teacher model. It incorporates regularization loss during training to ensure stable learning and uses student responses as pseudo-targets to mitigate exposure bias. PromptKD outperforms other KD methods, including supervised KD and SeqKD, and demonstrates superior performance across various model sizes and tasks. The method is efficient, memory-friendly, and effective in reducing exposure bias during training. PromptKD's success highlights the importance of student-friendly knowledge in generative language models and provides a new approach for model compression and knowledge distillation.
Reach us at info@study.space
Understanding PromptKD%3A Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning