23 May 2024 | Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani
The paper introduces INSTRUCTION MODELLING (IM), a method that trains language models (LMs) by applying a loss function to both the instruction and prompt parts, rather than solely to the output. This approach is designed to improve the performance of LMs on various natural language processing (NLP) tasks and open-ended generation benchmarks. Through extensive experiments across 21 benchmarks, the authors demonstrate that IM significantly enhances LM performance, particularly on the AlpacaEval 1.0 benchmark, where it boosts scores by over 100%.
Key findings include:
1. **Ratio of Instruction to Output Length**: IM is most effective when training datasets have lengthy instructions paired with brief outputs.
2. **Number of Training Examples**: IM performs better under the Superficial Alignment Hypothesis (SAH), where a small number of training examples are used, especially in low-resource settings.
The authors also analyze the mechanisms behind IM's effectiveness, suggesting that it reduces overfitting to instruction tuning datasets. They find that IM exhibits higher training losses but lower test losses compared to traditional instruction tuning (IT), indicating reduced overfitting. Additionally, IM generates outputs with lower similarity to training examples, further supporting its effectiveness.
The paper concludes by providing practical guidance for instruction tuning LMs, emphasizing the importance of considering the ratio of instruction to output length and the number of training examples. The code for IM is available at <https://github.com/ZhengxiangShi/InstructionModelling>.The paper introduces INSTRUCTION MODELLING (IM), a method that trains language models (LMs) by applying a loss function to both the instruction and prompt parts, rather than solely to the output. This approach is designed to improve the performance of LMs on various natural language processing (NLP) tasks and open-ended generation benchmarks. Through extensive experiments across 21 benchmarks, the authors demonstrate that IM significantly enhances LM performance, particularly on the AlpacaEval 1.0 benchmark, where it boosts scores by over 100%.
Key findings include:
1. **Ratio of Instruction to Output Length**: IM is most effective when training datasets have lengthy instructions paired with brief outputs.
2. **Number of Training Examples**: IM performs better under the Superficial Alignment Hypothesis (SAH), where a small number of training examples are used, especially in low-resource settings.
The authors also analyze the mechanisms behind IM's effectiveness, suggesting that it reduces overfitting to instruction tuning datasets. They find that IM exhibits higher training losses but lower test losses compared to traditional instruction tuning (IT), indicating reduced overfitting. Additionally, IM generates outputs with lower similarity to training examples, further supporting its effectiveness.
The paper concludes by providing practical guidance for instruction tuning LMs, emphasizing the importance of considering the ratio of instruction to output length and the number of training examples. The code for IM is available at <https://github.com/ZhengxiangShi/InstructionModelling>.