Understanding Selective Reflection-Tuning%3A Student-Selected Data Recycling for LLM Instruction-Tuning

**Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning** This paper introduces Selective Reflection-Tuning, a novel paradigm that combines the reflection and introspection capabilities of a teacher LLM with the data selection capability of a student LLM to automatically refine existing instruction-tuning data. The method aims to produce high-quality and student-compatible instruction-response pairs, leading to more efficient and effective instruction tuning and superior LLM performance. The key contributions of this work include: 1. **Teacher-Student Collaboration**: The teacher model reflects on the instruction and response of a given sample, generating improved versions. The student model then evaluates whether to incorporate these improvements based on its unique statistical attributes. 2. **Evaluation Schema**: The paper introduces the Reversed-Instruction-Following Difficulty (r-IFD) score, which measures the feasibility of a sample for the student model to learn from the response. 3. **Data Augmentation**: The method enhances the quality of the dataset without collecting new data, making it versatile and adaptable to various contexts. The authors apply their method to Alpaca and WizardLM datasets, achieving top-tier performance with significantly fewer data samples compared to existing methods. The paper also includes a detailed experimental setup, evaluation metrics, and ablation studies to validate the effectiveness of the proposed approach. **Conclusion**: Selective Reflection-Tuning demonstrates significant advancements in data improvement for instruction tuning of large language models, offering a more efficient and effective solution for model training.**Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning** This paper introduces Selective Reflection-Tuning, a novel paradigm that combines the reflection and introspection capabilities of a teacher LLM with the data selection capability of a student LLM to automatically refine existing instruction-tuning data. The method aims to produce high-quality and student-compatible instruction-response pairs, leading to more efficient and effective instruction tuning and superior LLM performance. The key contributions of this work include: 1. **Teacher-Student Collaboration**: The teacher model reflects on the instruction and response of a given sample, generating improved versions. The student model then evaluates whether to incorporate these improvements based on its unique statistical attributes. 2. **Evaluation Schema**: The paper introduces the Reversed-Instruction-Following Difficulty (r-IFD) score, which measures the feasibility of a sample for the student model to learn from the response. 3. **Data Augmentation**: The method enhances the quality of the dataset without collecting new data, making it versatile and adaptable to various contexts. The authors apply their method to Alpaca and WizardLM datasets, achieving top-tier performance with significantly fewer data samples compared to existing methods. The paper also includes a detailed experimental setup, evaluation metrics, and ablation studies to validate the effectiveness of the proposed approach. **Conclusion**: Selective Reflection-Tuning demonstrates significant advancements in data improvement for instruction tuning of large language models, offering a more efficient and effective solution for model training.

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

7 Jun 2024 | Ming Li, Lichang Chen, Jiahui Chen, Shuai He, Jiuxiang Gu, Tianyi Zhou