2 Feb 2024 | Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou
Activation Beacon is a method that extends the context length of large language models (LLMs) by condensing their raw activations into compact forms. This allows the LLM to process longer contexts without increasing the context window size. The method introduces a plug-in module that preserves the LLM's original capabilities in short contexts while enabling it to handle longer contexts efficiently. Activation Beacon works with a sliding window to process long contexts, achieving competitive memory and time efficiency during training and inference. It is trained using short-sequence data with diverse condensing ratios, allowing it to support various context lengths with minimal training cost. Experiments show that Activation Beacon can extend the context length of Llama-2-7B from 4K to 400K, achieving high-quality context extension and superior performance on long-context tasks. The method is compatible with existing LLMs and can be combined with other techniques like position interpolation and retrieval to further enhance context extension. Activation Beacon offers a cost-effective and efficient solution for extending LLM context lengths, with high compatibility and low training costs.Activation Beacon is a method that extends the context length of large language models (LLMs) by condensing their raw activations into compact forms. This allows the LLM to process longer contexts without increasing the context window size. The method introduces a plug-in module that preserves the LLM's original capabilities in short contexts while enabling it to handle longer contexts efficiently. Activation Beacon works with a sliding window to process long contexts, achieving competitive memory and time efficiency during training and inference. It is trained using short-sequence data with diverse condensing ratios, allowing it to support various context lengths with minimal training cost. Experiments show that Activation Beacon can extend the context length of Llama-2-7B from 4K to 400K, achieving high-quality context extension and superior performance on long-context tasks. The method is compatible with existing LLMs and can be combined with other techniques like position interpolation and retrieval to further enhance context extension. Activation Beacon offers a cost-effective and efficient solution for extending LLM context lengths, with high compatibility and low training costs.