21 Feb 2024 | Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang
This paper introduces LongRoPE, a method that extends the context window of large language models (LLMs) to 2048k tokens, significantly beyond the current limit of around 128k tokens. LongRoPE achieves this by leveraging three key innovations: (1) identifying and exploiting non-uniformities in positional interpolation through an efficient search, (2) using a progressive extension strategy, and (3) re-adjusting LongRoPE for shorter context windows. The method maintains performance at the original short context window while enabling an 8× extension in non-fine-tuning scenarios. Extensive experiments on LLaMA2 and Mistral models across various tasks demonstrate the effectiveness of LongRoPE, showing low perplexity from 4k to 2048k evaluation lengths and high passkey retrieval accuracy. The code for LongRoPE will be available at <https://github.com/microsoft/LongRoPE>.This paper introduces LongRoPE, a method that extends the context window of large language models (LLMs) to 2048k tokens, significantly beyond the current limit of around 128k tokens. LongRoPE achieves this by leveraging three key innovations: (1) identifying and exploiting non-uniformities in positional interpolation through an efficient search, (2) using a progressive extension strategy, and (3) re-adjusting LongRoPE for shorter context windows. The method maintains performance at the original short context window while enabling an 8× extension in non-fine-tuning scenarios. Extensive experiments on LLaMA2 and Mistral models across various tasks demonstrate the effectiveness of LongRoPE, showing low perplexity from 4k to 2048k evaluation lengths and high passkey retrieval accuracy. The code for LongRoPE will be available at <https://github.com/microsoft/LongRoPE>.