2 Feb 2024 | Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou
The paper introduces Activation Beacon, a novel method to extend the context window of large language models (LLMs) without significantly increasing training or inference costs. The method condenses LLM's raw activations into more compact forms, allowing the LLM to perceive a longer context with a limited context window. Activation Beacon is designed as a plug-in module that preserves the LLM's original capabilities in short contexts and works with a sliding window to stream process long contexts efficiently. The method is trained using short-sequence data with diversified condensing ratios, enabling it to support different context lengths with minimal training cost. Experiments on Llama-2-7B show that Activation Beacon can extend the context length by 100 times (from 4K to 400K) while maintaining high-quality generation and superior performance across various long-context tasks. The method is also compatible with other context extension techniques and retrieval methods, further enhancing its capabilities.The paper introduces Activation Beacon, a novel method to extend the context window of large language models (LLMs) without significantly increasing training or inference costs. The method condenses LLM's raw activations into more compact forms, allowing the LLM to perceive a longer context with a limited context window. Activation Beacon is designed as a plug-in module that preserves the LLM's original capabilities in short contexts and works with a sliding window to stream process long contexts efficiently. The method is trained using short-sequence data with diversified condensing ratios, enabling it to support different context lengths with minimal training cost. Experiments on Llama-2-7B show that Activation Beacon can extend the context length by 100 times (from 4K to 400K) while maintaining high-quality generation and superior performance across various long-context tasks. The method is also compatible with other context extension techniques and retrieval methods, further enhancing its capabilities.