[slides and audio] Long Context Alignment with Short Instructions and Synthesized Positions

This paper introduces SkipAlign, a novel technique designed to enhance the long-context capabilities of Large Language Models (LLMs) without requiring additional data or computational resources. SkipAlign focuses on synthesizing long-range dependencies by strategically inserting skipped positions within instruction-following samples, leveraging the semantic structure of the data to expand the context. The method is evaluated on various long-context tasks using base models with different context window sizes, demonstrating its effectiveness. Notably, a 6B parameter SkipAlign model achieves comparable performance to strong baselines like GPT-3.5-Turbo-16K on the LongBench benchmark. The code and SkipAligned models are open-sourced, providing a practical solution for improving LLMs' long-context processing capabilities.This paper introduces SkipAlign, a novel technique designed to enhance the long-context capabilities of Large Language Models (LLMs) without requiring additional data or computational resources. SkipAlign focuses on synthesizing long-range dependencies by strategically inserting skipped positions within instruction-following samples, leveraging the semantic structure of the data to expand the context. The method is evaluated on various long-context tasks using base models with different context window sizes, demonstrating its effectiveness. Notably, a 6B parameter SkipAlign model achieves comparable performance to strong baselines like GPT-3.5-Turbo-16K on the LongBench benchmark. The code and SkipAligned models are open-sourced, providing a practical solution for improving LLMs' long-context processing capabilities.

Long Context Alignment with Short Instructions and Synthesized Positions

7 May 2024 | Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li