Extending LLMs' Context Window with 100 Samples

Extending LLMs' Context Window with 100 Samples

13 Jan 2024 | Yikai Zhang, Junlong Li, Pengfei Liu
This paper proposes an entropy-aware ABF method to extend the context window of large language models (LLMs). The method combines adjusted base frequency (ABF) with a dynamic attention scalar to maintain attention entropy, which is crucial for LLMs to process long sequences effectively. The approach is validated on various context-demanding tasks, demonstrating superior performance in fine-tuning and robustness across different context window sizes. Notably, the method extends the context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6 training steps, showcasing its efficiency. The study also explores how data composition and training curricula affect context window extension for specific downstream tasks, suggesting that fine-tuning LLMs with lengthy conversations is a good starting point. The method outperforms existing RoPE-extension techniques in terms of data efficiency and robustness, achieving competitive long-context performance with minimal training resources. The paper highlights the importance of maintaining attention entropy for LLMs to function properly and provides insights into effective strategies for extending context windows in practical applications.This paper proposes an entropy-aware ABF method to extend the context window of large language models (LLMs). The method combines adjusted base frequency (ABF) with a dynamic attention scalar to maintain attention entropy, which is crucial for LLMs to process long sequences effectively. The approach is validated on various context-demanding tasks, demonstrating superior performance in fine-tuning and robustness across different context window sizes. Notably, the method extends the context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6 training steps, showcasing its efficiency. The study also explores how data composition and training curricula affect context window extension for specific downstream tasks, suggesting that fine-tuning LLMs with lengthy conversations is a good starting point. The method outperforms existing RoPE-extension techniques in terms of data efficiency and robustness, achieving competitive long-context performance with minimal training resources. The paper highlights the importance of maintaining attention entropy for LLMs to function properly and provides insights into effective strategies for extending context windows in practical applications.
Reach us at info@study.space