30 Mar 2024 | Feifan Song, Bowen Yu, Hao Lang, Haiyang Yu, Fei Huang, Houfeng Wang, Yongbin Li
This paper investigates the impact of data diversity on fine-tuning large language models (LLMs) for human alignment. The study addresses the challenge of allocating limited human annotation resources between diverse prompts and diverse responses. The authors find that increasing the number of responses rather than prompts leads to better performance in human alignment. They propose a new formulation of prompt diversity based on N-grams, which reveals a linear correlation with the final performance of LLMs after fine-tuning. The study also explores data augmentation techniques that enhance data diversity, leading to improved LLM performance. The results show that increasing the number of responses provides clearer signals for fine-tuning, resulting in more significant improvements in LLM performance. The study also establishes a scaling law between prompt diversity and LLM performance, demonstrating that a linear relationship exists between the diversity of prompts and the final performance of fine-tuned LLMs. The findings suggest that increasing the number of responses is more beneficial for human alignment than increasing the number of prompts. The study also proposes a method for data augmentation that enhances diversity, leading to better performance of fine-tuned LLMs. The results indicate that increasing the number of responses leads to better performance in human alignment compared to increasing the number of prompts. The study also highlights the importance of balancing the number of prompts and responses to achieve optimal performance in human alignment. The findings contribute to the understanding of how data diversity affects the performance of LLMs in human alignment tasks.This paper investigates the impact of data diversity on fine-tuning large language models (LLMs) for human alignment. The study addresses the challenge of allocating limited human annotation resources between diverse prompts and diverse responses. The authors find that increasing the number of responses rather than prompts leads to better performance in human alignment. They propose a new formulation of prompt diversity based on N-grams, which reveals a linear correlation with the final performance of LLMs after fine-tuning. The study also explores data augmentation techniques that enhance data diversity, leading to improved LLM performance. The results show that increasing the number of responses provides clearer signals for fine-tuning, resulting in more significant improvements in LLM performance. The study also establishes a scaling law between prompt diversity and LLM performance, demonstrating that a linear relationship exists between the diversity of prompts and the final performance of fine-tuned LLMs. The findings suggest that increasing the number of responses is more beneficial for human alignment than increasing the number of prompts. The study also proposes a method for data augmentation that enhances diversity, leading to better performance of fine-tuned LLMs. The results indicate that increasing the number of responses leads to better performance in human alignment compared to increasing the number of prompts. The study also highlights the importance of balancing the number of prompts and responses to achieve optimal performance in human alignment. The findings contribute to the understanding of how data diversity affects the performance of LLMs in human alignment tasks.