31 Jan 2024 | Yushi Bai†‡, Xin Lv§, Jiajie Zhang†‡, Yuze He‡, Ji Qi†‡, Lei Hou‡, Jie Tang‡, Yuxiao Dong‡, Juanzi Li‡
**LongAlign: A Recipe for Long Context Alignment of Large Language Models**
**Authors:** Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li
**Institution:** Tsinghua University Zhipu.AI
**Abstract:**
Extending large language models (LLMs) to handle long contexts requires instruction fine-tuning on input sequences of similar lengths. To address this, the authors present LongAlign, a recipe for instruction data, training, and evaluation for long context alignment. They construct a diverse long instruction-following dataset using Self-Instruct, adopt packing and sorted batching strategies to speed up supervised fine-tuning, and develop a loss weighting method to balance the contribution to the loss across different sequences. Additionally, they introduce the LongBench-Chat benchmark to evaluate instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing methods by up to 30% in long context tasks while maintaining proficiency in handling short, generic tasks.
**Key Contributions:**
1. **Data Construction:** They collect long sequences from nine sources and use Self-Instruct to generate 10k instruction data of 8k-64k length.
2. **Efficient Training:** They adopt packing and sorted batching strategies to speed up training and introduce a loss weighting method to balance the contribution to the loss.
3. **Evaluation:** They develop LongBench-Chat, a benchmark for evaluating instruction-following capabilities on real-world queries of 10k-100k in length.
**Experiments:**
- **Data Influence:** More long instruction data enhances performance in long tasks without compromising short tasks.
- **Diversity of Data:** Diversity in long instruction data is beneficial for instruction-following abilities.
- **Training Methods:** Packing and sorted batching double training efficiency while maintaining good performance.
- **Loss Weighting:** Significantly improves performance on long instruction tasks.
**Conclusion:**
LongAlign effectively aligns models to handle contexts of up to 64k tokens while maintaining performance on general tasks. The code, data, and long-aligned models are open-sourced.**LongAlign: A Recipe for Long Context Alignment of Large Language Models**
**Authors:** Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li
**Institution:** Tsinghua University Zhipu.AI
**Abstract:**
Extending large language models (LLMs) to handle long contexts requires instruction fine-tuning on input sequences of similar lengths. To address this, the authors present LongAlign, a recipe for instruction data, training, and evaluation for long context alignment. They construct a diverse long instruction-following dataset using Self-Instruct, adopt packing and sorted batching strategies to speed up supervised fine-tuning, and develop a loss weighting method to balance the contribution to the loss across different sequences. Additionally, they introduce the LongBench-Chat benchmark to evaluate instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing methods by up to 30% in long context tasks while maintaining proficiency in handling short, generic tasks.
**Key Contributions:**
1. **Data Construction:** They collect long sequences from nine sources and use Self-Instruct to generate 10k instruction data of 8k-64k length.
2. **Efficient Training:** They adopt packing and sorted batching strategies to speed up training and introduce a loss weighting method to balance the contribution to the loss.
3. **Evaluation:** They develop LongBench-Chat, a benchmark for evaluating instruction-following capabilities on real-world queries of 10k-100k in length.
**Experiments:**
- **Data Influence:** More long instruction data enhances performance in long tasks without compromising short tasks.
- **Diversity of Data:** Diversity in long instruction data is beneficial for instruction-following abilities.
- **Training Methods:** Packing and sorted batching double training efficiency while maintaining good performance.
- **Loss Weighting:** Significantly improves performance on long instruction tasks.
**Conclusion:**
LongAlign effectively aligns models to handle contexts of up to 64k tokens while maintaining performance on general tasks. The code, data, and long-aligned models are open-sourced.