This paper addresses the challenge of long-context handling in large language models (LLMs) by proposing SelfExtend, a method that extends the context window of LLMs without requiring fine-tuning. The key idea is to use bi-level attention information: grouped attention for capturing dependencies among tokens far apart, and neighbor attention for dependencies among adjacent tokens within a specified range. These attentions are computed using the original model's self-attention mechanism during inference. SelfExtend is designed to map unseen large relative positions to those encountered during pretraining, addressing the Out-of-Distribution (O.O.D.) positional issues. The method is evaluated on various benchmarks, demonstrating its effectiveness in improving LLMs' long-context understanding ability, often outperforming fine-tuning-based methods. The code for SelfExtend is available at <https://github.com/datamlab/LongLM>.This paper addresses the challenge of long-context handling in large language models (LLMs) by proposing SelfExtend, a method that extends the context window of LLMs without requiring fine-tuning. The key idea is to use bi-level attention information: grouped attention for capturing dependencies among tokens far apart, and neighbor attention for dependencies among adjacent tokens within a specified range. These attentions are computed using the original model's self-attention mechanism during inference. SelfExtend is designed to map unseen large relative positions to those encountered during pretraining, addressing the Out-of-Distribution (O.O.D.) positional issues. The method is evaluated on various benchmarks, demonstrating its effectiveness in improving LLMs' long-context understanding ability, often outperforming fine-tuning-based methods. The code for SelfExtend is available at <https://github.com/datamlab/LongLM>.