SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

26 Feb 2024 | Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang
The paper introduces SelectIT, a novel approach for instruction tuning (IT) in large language models (LLMs) that leverages the intrinsic uncertainty of LLMs to select high-quality IT data without requiring additional resources. The authors propose a three-grain uncertainty evaluation method: token, sentence, and model-level self-reflection, which enhances the accuracy and reliability of IT data selection. By applying SelectIT to the Alpaca-GPT4 dataset, they create the Selective Alpaca, a compact and superior IT dataset. Empirical results show that SelectIT significantly improves the performance of LLMs on various benchmarks, particularly in reasoning and computational tasks. The method also demonstrates robustness across different foundation models and domain-specific tasks, suggesting that longer and more computationally intensive IT data may be more effective. The paper concludes with a discussion of the limitations and future directions, including the need to explore data quantity thresholds, scale the method to larger models, broaden the scope to alternative foundation models, and extend the methodology to other instruction tuning datasets.The paper introduces SelectIT, a novel approach for instruction tuning (IT) in large language models (LLMs) that leverages the intrinsic uncertainty of LLMs to select high-quality IT data without requiring additional resources. The authors propose a three-grain uncertainty evaluation method: token, sentence, and model-level self-reflection, which enhances the accuracy and reliability of IT data selection. By applying SelectIT to the Alpaca-GPT4 dataset, they create the Selective Alpaca, a compact and superior IT dataset. Empirical results show that SelectIT significantly improves the performance of LLMs on various benchmarks, particularly in reasoning and computational tasks. The method also demonstrates robustness across different foundation models and domain-specific tasks, suggesting that longer and more computationally intensive IT data may be more effective. The paper concludes with a discussion of the limitations and future directions, including the need to explore data quantity thresholds, scale the method to larger models, broaden the scope to alternative foundation models, and extend the methodology to other instruction tuning datasets.
Reach us at info@study.space
Understanding SelectIT%3A Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection