[slides and audio] InfLLM%3A Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

The paper introduces InfLLM, a training-free method to enhance the context length generalizability of large language models (LLMs). InfLLM addresses the challenge of processing long sequences by incorporating an efficient context memory and a sliding window attention mechanism. The context memory stores distant contexts, allowing LLMs to efficiently process long sequences with limited context windows while capturing long-distance dependencies. The method does not require additional training and can achieve comparable performance to models trained on longer sequences. Experiments on benchmarks like ∞-Bench and LongBench demonstrate that InfLLM enables LLMs to handle sequences up to 1,024K tokens effectively, outperforming other methods that rely on continual training or retrieval-augmented generation. The paper also explores the impact of various parameters in the context memory and provides ablation studies to validate the effectiveness of the proposed approach.The paper introduces InfLLM, a training-free method to enhance the context length generalizability of large language models (LLMs). InfLLM addresses the challenge of processing long sequences by incorporating an efficient context memory and a sliding window attention mechanism. The context memory stores distant contexts, allowing LLMs to efficiently process long sequences with limited context windows while capturing long-distance dependencies. The method does not require additional training and can achieve comparable performance to models trained on longer sequences. Experiments on benchmarks like ∞-Bench and LongBench demonstrate that InfLLM enables LLMs to handle sequences up to 1,024K tokens effectively, outperforming other methods that rely on continual training or retrieval-augmented generation. The paper also explores the impact of various parameters in the context memory and provides ablation studies to validate the effectiveness of the proposed approach.

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

28 May 2024 | Chaojun Xiao1, Pengle Zhang1, Xu Han1†, Guangxuan Xiao2, Yankai Lin3, Zhengyan Zhang1, Zhiyuan Liu1†, Maosong Sun1

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

28 May 2024 | Chaojun Xiao1*, Pengle Zhang1*, Xu Han1†, Guangxuan Xiao2, Yankai Lin3, Zhengyan Zhang1, Zhiyuan Liu1†, Maosong Sun1

28 May 2024 | Chaojun Xiao1, Pengle Zhang1, Xu Han1†, Guangxuan Xiao2, Yankai Lin3, Zhengyan Zhang1, Zhiyuan Liu1†, Maosong Sun1