23 May 2024 | Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani
This paper introduces INSTRUCTION MODELLING (IM), a method for instruction tuning that improves the performance of language models (LMs) on both NLP tasks and open-ended generation benchmarks. Unlike traditional instruction tuning, which focuses only on the output part of the instruction, IM applies a loss function to both the instruction and the completion parts. This approach enhances the LM's ability to follow instructions and perform well on various tasks. The study shows that IM significantly improves performance on benchmarks such as AlpacaEval 1.0, with some cases showing over a 100% improvement. The effectiveness of IM is influenced by two key factors: the ratio between instruction and output lengths in the training data, and the number of training examples. IM is particularly beneficial when trained on datasets with long instructions and short outputs, or under the Superficial Alignment Hypothesis (SAH), where only a small amount of training data is used. The study also finds that IM reduces overfitting to instruction tuning datasets, leading to better generalization. Additionally, IM performs well across different LM sizes and can be combined with other methods like NEFTUNE. The results show that IM outperforms traditional instruction tuning in various scenarios, especially in low-resource settings. The paper provides practical guidance for instruction tuning and highlights the importance of considering the ratio of instruction and output lengths, as well as the quantity of training data, when designing instruction tuning strategies.This paper introduces INSTRUCTION MODELLING (IM), a method for instruction tuning that improves the performance of language models (LMs) on both NLP tasks and open-ended generation benchmarks. Unlike traditional instruction tuning, which focuses only on the output part of the instruction, IM applies a loss function to both the instruction and the completion parts. This approach enhances the LM's ability to follow instructions and perform well on various tasks. The study shows that IM significantly improves performance on benchmarks such as AlpacaEval 1.0, with some cases showing over a 100% improvement. The effectiveness of IM is influenced by two key factors: the ratio between instruction and output lengths in the training data, and the number of training examples. IM is particularly beneficial when trained on datasets with long instructions and short outputs, or under the Superficial Alignment Hypothesis (SAH), where only a small amount of training data is used. The study also finds that IM reduces overfitting to instruction tuning datasets, leading to better generalization. Additionally, IM performs well across different LM sizes and can be combined with other methods like NEFTUNE. The results show that IM outperforms traditional instruction tuning in various scenarios, especially in low-resource settings. The paper provides practical guidance for instruction tuning and highlights the importance of considering the ratio of instruction and output lengths, as well as the quantity of training data, when designing instruction tuning strategies.