The paper introduces InfLLM, a training-free method to enhance the context length generalizability of large language models (LLMs). InfLLM addresses the challenge of processing long sequences by incorporating an efficient context memory and a sliding window attention mechanism. The context memory stores distant contexts, allowing LLMs to efficiently process long sequences with limited context windows while capturing long-distance dependencies. The method does not require additional training and can achieve comparable performance to models trained on longer sequences. Experiments on benchmarks like ∞-Bench and LongBench demonstrate that InfLLM enables LLMs to handle sequences up to 1,024K tokens effectively, outperforming other methods that rely on continual training or retrieval-augmented generation. The paper also explores the impact of various parameters in the context memory and provides ablation studies to validate the effectiveness of the proposed approach.The paper introduces InfLLM, a training-free method to enhance the context length generalizability of large language models (LLMs). InfLLM addresses the challenge of processing long sequences by incorporating an efficient context memory and a sliding window attention mechanism. The context memory stores distant contexts, allowing LLMs to efficiently process long sequences with limited context windows while capturing long-distance dependencies. The method does not require additional training and can achieve comparable performance to models trained on longer sequences. Experiments on benchmarks like ∞-Bench and LongBench demonstrate that InfLLM enables LLMs to handle sequences up to 1,024K tokens effectively, outperforming other methods that rely on continual training or retrieval-augmented generation. The paper also explores the impact of various parameters in the context memory and provides ablation studies to validate the effectiveness of the proposed approach.