This paper investigates the relationship between the base of Rotary Position Embedding (RoPE) and the maximum context length that large language models (LLMs) can process. RoPE is a technique used to encode position information in LLMs, and it has been widely adopted for its ability to extend context length without retraining. However, the paper reveals that the base value of RoPE has an absolute lower bound that determines the maximum context length an LLM can effectively process. This lower bound is derived theoretically and empirically, showing that a smaller base value may lead to superficial long-context capability, where the model maintains low perplexity but fails to retrieve information from long contexts. The paper also demonstrates that the OOD (out-of-distribution) theory, which is often used to explain long context extensions, is insufficient to fully capture the model's ability to process long contexts. Theoretical analysis shows that the ability of the model to attend to similar tokens decreases as the relative distance increases, leading to a long-term decay in the model's performance. The paper further validates these findings through extensive experiments on various LLMs, including Llama2-7B, Baichuan2-7B, and a 2-billion parameter model trained from scratch. The results show that the base of RoPE directly determines the maximum context length the model can process, and that this bound holds both during pre-training and fine-tuning stages. The paper concludes that the base of RoPE is a critical factor in determining the long-context capability of LLMs, and that a deeper understanding of this relationship is essential for improving the performance of these models.This paper investigates the relationship between the base of Rotary Position Embedding (RoPE) and the maximum context length that large language models (LLMs) can process. RoPE is a technique used to encode position information in LLMs, and it has been widely adopted for its ability to extend context length without retraining. However, the paper reveals that the base value of RoPE has an absolute lower bound that determines the maximum context length an LLM can effectively process. This lower bound is derived theoretically and empirically, showing that a smaller base value may lead to superficial long-context capability, where the model maintains low perplexity but fails to retrieve information from long contexts. The paper also demonstrates that the OOD (out-of-distribution) theory, which is often used to explain long context extensions, is insufficient to fully capture the model's ability to process long contexts. Theoretical analysis shows that the ability of the model to attend to similar tokens decreases as the relative distance increases, leading to a long-term decay in the model's performance. The paper further validates these findings through extensive experiments on various LLMs, including Llama2-7B, Baichuan2-7B, and a 2-billion parameter model trained from scratch. The results show that the base of RoPE directly determines the maximum context length the model can process, and that this bound holds both during pre-training and fine-tuning stages. The paper concludes that the base of RoPE is a critical factor in determining the long-context capability of LLMs, and that a deeper understanding of this relationship is essential for improving the performance of these models.