26 Feb 2024 | Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang
SelectIT is a novel method for instruction tuning (IT) of large language models (LLMs) that leverages the uncertainty inherent in LLMs to select high-quality IT data without requiring additional resources. The method employs three levels of self-reflection: token, sentence, and model, to enhance the accuracy and reliability of IT data selection. By applying SelectIT to the Alpaca-GPT4 dataset, the authors introduce a new IT dataset called Selective Alpaca, which demonstrates significant improvements in LLM performance across various tasks. Experimental results show that SelectIT outperforms existing data selection methods, particularly in reasoning and coding tasks. The method is robust across different foundation models and domain-specific tasks, and its effectiveness is supported by analysis showing that longer and more computationally intensive IT data can lead to better performance. SelectIT also effectively eliminates abnormal data and retains diverse, high-quality samples. The method is flexible and applicable to various models and domains, and its results are validated through extensive experiments on multiple benchmarks. The study highlights the importance of data quality in instruction tuning and provides insights into the characteristics of optimal IT data. The code and data are available for further research.SelectIT is a novel method for instruction tuning (IT) of large language models (LLMs) that leverages the uncertainty inherent in LLMs to select high-quality IT data without requiring additional resources. The method employs three levels of self-reflection: token, sentence, and model, to enhance the accuracy and reliability of IT data selection. By applying SelectIT to the Alpaca-GPT4 dataset, the authors introduce a new IT dataset called Selective Alpaca, which demonstrates significant improvements in LLM performance across various tasks. Experimental results show that SelectIT outperforms existing data selection methods, particularly in reasoning and coding tasks. The method is robust across different foundation models and domain-specific tasks, and its effectiveness is supported by analysis showing that longer and more computationally intensive IT data can lead to better performance. SelectIT also effectively eliminates abnormal data and retains diverse, high-quality samples. The method is flexible and applicable to various models and domains, and its results are validated through extensive experiments on multiple benchmarks. The study highlights the importance of data quality in instruction tuning and provides insights into the characteristics of optimal IT data. The code and data are available for further research.