Understanding Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

Self-Distillation Fine-Tuning (SDFT) addresses the distribution gap between task datasets and large language models (LLMs) during fine-tuning. The method generates a distilled dataset by having the model rewrite responses to align with the original distribution, thereby mitigating catastrophic forgetting. Experiments on the Llama-2-chat model show that SDFT achieves comparable or superior performance on downstream tasks compared to vanilla fine-tuning while preserving the model's general instruction-following abilities and safety alignment. SDFT also maintains the model's general knowledge and reduces distribution shift, leading to better performance on various benchmarks. The method is effective across different model scales and architectures, and its results demonstrate that it reduces forgetting and improves model performance. However, the study is limited by computational resources and the use of specific datasets. The proposed method shows promise in improving the effectiveness of LLMs for downstream tasks without compromising their safety and general capabilities.Self-Distillation Fine-Tuning (SDFT) addresses the distribution gap between task datasets and large language models (LLMs) during fine-tuning. The method generates a distilled dataset by having the model rewrite responses to align with the original distribution, thereby mitigating catastrophic forgetting. Experiments on the Llama-2-chat model show that SDFT achieves comparable or superior performance on downstream tasks compared to vanilla fine-tuning while preserving the model's general instruction-following abilities and safety alignment. SDFT also maintains the model's general knowledge and reduces distribution shift, leading to better performance on various benchmarks. The method is effective across different model scales and architectures, and its results demonstrate that it reduces forgetting and improves model performance. However, the study is limited by computational resources and the use of specific datasets. The proposed method shows promise in improving the effectiveness of LLMs for downstream tasks without compromising their safety and general capabilities.

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

28 May 2024 | Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, Qian Liu