Rethinking Data Selection for Supervised Fine-Tuning

Rethinking Data Selection for Supervised Fine-Tuning

8 Feb 2024 | Ming Shen
This paper reevaluates the approach to data selection in supervised fine-tuning (SFT) for large language models (LLMs). While SFT is often seen as superficial, focusing on style rather than content, recent studies highlight the importance of data selection in achieving better performance. The authors propose that SFT data should focus on human-like interactions rather than data quality or diversity. They find that selecting instances with long responses leads to better performance than traditional methods based on quality or diversity. This approach mimics the detailed, helpful nature of human-style responses. The study shows that using only the top 1,000 instances with long responses in the Alpaca dataset significantly improves instruction-following capabilities compared to using the full dataset or instances selected based on quality or diversity. The results suggest that focusing on human-like style in SFT data selection is more effective than traditional methods. The study also highlights the importance of evaluating SFT data using human judges to mitigate biases such as verbosity bias. Overall, the findings suggest that selecting SFT data based on human-like style can lead to better performance in instruction-following tasks.This paper reevaluates the approach to data selection in supervised fine-tuning (SFT) for large language models (LLMs). While SFT is often seen as superficial, focusing on style rather than content, recent studies highlight the importance of data selection in achieving better performance. The authors propose that SFT data should focus on human-like interactions rather than data quality or diversity. They find that selecting instances with long responses leads to better performance than traditional methods based on quality or diversity. This approach mimics the detailed, helpful nature of human-style responses. The study shows that using only the top 1,000 instances with long responses in the Alpaca dataset significantly improves instruction-following capabilities compared to using the full dataset or instances selected based on quality or diversity. The results suggest that focusing on human-like style in SFT data selection is more effective than traditional methods. The study also highlights the importance of evaluating SFT data using human judges to mitigate biases such as verbosity bias. Overall, the findings suggest that selecting SFT data based on human-like style can lead to better performance in instruction-following tasks.
Reach us at info@study.space
[slides and audio] Rethinking Data Selection for Supervised Fine-Tuning