The paper introduces Context Expansion with Parallel Encoding (CEPE), a framework designed to extend the context window of large language models (LLMs) by processing long inputs chunk by chunk using a small encoder. CEPE employs cross-attention to enable the frozen decoder to utilize additional contexts, offering efficiency, generalizability, and versatility. Trained on 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, achieving 10x throughput with only 1/6 of the memory. CEPE demonstrates strong performance in language modeling, in-context learning, and retrieval-augmented applications, outperforming existing methods. The paper also introduces CEPED, a variant of CEPE that extends the context window of instruction-tuned models using only unlabeled data, further enhancing their performance on long-text understanding tasks.The paper introduces Context Expansion with Parallel Encoding (CEPE), a framework designed to extend the context window of large language models (LLMs) by processing long inputs chunk by chunk using a small encoder. CEPE employs cross-attention to enable the frozen decoder to utilize additional contexts, offering efficiency, generalizability, and versatility. Trained on 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, achieving 10x throughput with only 1/6 of the memory. CEPE demonstrates strong performance in language modeling, in-context learning, and retrieval-augmented applications, outperforming existing methods. The paper also introduces CEPED, a variant of CEPE that extends the context window of instruction-tuned models using only unlabeled data, further enhancing their performance on long-text understanding tasks.