LongAlign: A Recipe for Long Context Alignment of Large Language Models

LongAlign: A Recipe for Long Context Alignment of Large Language Models

31 Jan 2024 | Yushi Bai†‡, Xin Lv§, Jiajie Zhang†‡, Yuze He‡, Ji Qi†‡, Lei Hou‡, Jie Tang‡, Yuxiao Dong‡, Juanzi Li‡
**LongAlign: A Recipe for Long Context Alignment of Large Language Models** **Authors:** Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li **Institution:** Tsinghua University Zhipu.AI **Abstract:** Extending large language models (LLMs) to handle long contexts requires instruction fine-tuning on input sequences of similar lengths. To address this, the authors present LongAlign, a recipe for instruction data, training, and evaluation for long context alignment. They construct a diverse long instruction-following dataset using Self-Instruct, adopt packing and sorted batching strategies to speed up supervised fine-tuning, and develop a loss weighting method to balance the contribution to the loss across different sequences. Additionally, they introduce the LongBench-Chat benchmark to evaluate instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing methods by up to 30% in long context tasks while maintaining proficiency in handling short, generic tasks. **Key Contributions:** 1. **Data Construction:** They collect long sequences from nine sources and use Self-Instruct to generate 10k instruction data of 8k-64k length. 2. **Efficient Training:** They adopt packing and sorted batching strategies to speed up training and introduce a loss weighting method to balance the contribution to the loss. 3. **Evaluation:** They develop LongBench-Chat, a benchmark for evaluating instruction-following capabilities on real-world queries of 10k-100k in length. **Experiments:** - **Data Influence:** More long instruction data enhances performance in long tasks without compromising short tasks. - **Diversity of Data:** Diversity in long instruction data is beneficial for instruction-following abilities. - **Training Methods:** Packing and sorted batching double training efficiency while maintaining good performance. - **Loss Weighting:** Significantly improves performance on long instruction tasks. **Conclusion:** LongAlign effectively aligns models to handle contexts of up to 64k tokens while maintaining performance on general tasks. The code, data, and long-aligned models are open-sourced.**LongAlign: A Recipe for Long Context Alignment of Large Language Models** **Authors:** Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li **Institution:** Tsinghua University Zhipu.AI **Abstract:** Extending large language models (LLMs) to handle long contexts requires instruction fine-tuning on input sequences of similar lengths. To address this, the authors present LongAlign, a recipe for instruction data, training, and evaluation for long context alignment. They construct a diverse long instruction-following dataset using Self-Instruct, adopt packing and sorted batching strategies to speed up supervised fine-tuning, and develop a loss weighting method to balance the contribution to the loss across different sequences. Additionally, they introduce the LongBench-Chat benchmark to evaluate instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing methods by up to 30% in long context tasks while maintaining proficiency in handling short, generic tasks. **Key Contributions:** 1. **Data Construction:** They collect long sequences from nine sources and use Self-Instruct to generate 10k instruction data of 8k-64k length. 2. **Efficient Training:** They adopt packing and sorted batching strategies to speed up training and introduce a loss weighting method to balance the contribution to the loss. 3. **Evaluation:** They develop LongBench-Chat, a benchmark for evaluating instruction-following capabilities on real-world queries of 10k-100k in length. **Experiments:** - **Data Influence:** More long instruction data enhances performance in long tasks without compromising short tasks. - **Diversity of Data:** Diversity in long instruction data is beneficial for instruction-following abilities. - **Training Methods:** Packing and sorted batching double training efficiency while maintaining good performance. - **Loss Weighting:** Significantly improves performance on long instruction tasks. **Conclusion:** LongAlign effectively aligns models to handle contexts of up to 64k tokens while maintaining performance on general tasks. The code, data, and long-aligned models are open-sourced.
Reach us at info@study.space