2024 | Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, Jinwoo Shin
This paper introduces HOMER (Hierarchical cOntext MERging), a novel training-free method to extend the context limit of large language models (LLMs) while maintaining computational efficiency. HOMER employs a divide-and-conquer strategy, dividing long inputs into manageable chunks and merging them hierarchically across transformer layers. Before merging, token reduction is applied to ensure memory efficiency. The method also introduces an optimized computational order that reduces memory requirements to logarithmically scale with input length, making it suitable for memory-constrained environments. Experiments show that HOMER achieves high accuracy and memory efficiency, outperforming baselines in tasks such as passkey retrieval and question answering. It maintains low perplexity on long documents, demonstrating its ability to handle extended contexts. HOMER can be applied to pre-trained LLMs without further training, making it practical for real-world applications. The method is compatible with conventional positional encoding scaling techniques and further improves performance when used in conjunction with them. The paper also discusses the computational efficiency of HOMER, showing significant memory savings and reduced computational overhead. Overall, HOMER provides an effective solution for extending the context limit of LLMs while maintaining efficiency.This paper introduces HOMER (Hierarchical cOntext MERging), a novel training-free method to extend the context limit of large language models (LLMs) while maintaining computational efficiency. HOMER employs a divide-and-conquer strategy, dividing long inputs into manageable chunks and merging them hierarchically across transformer layers. Before merging, token reduction is applied to ensure memory efficiency. The method also introduces an optimized computational order that reduces memory requirements to logarithmically scale with input length, making it suitable for memory-constrained environments. Experiments show that HOMER achieves high accuracy and memory efficiency, outperforming baselines in tasks such as passkey retrieval and question answering. It maintains low perplexity on long documents, demonstrating its ability to handle extended contexts. HOMER can be applied to pre-trained LLMs without further training, making it practical for real-world applications. The method is compatible with conventional positional encoding scaling techniques and further improves performance when used in conjunction with them. The paper also discusses the computational efficiency of HOMER, showing significant memory savings and reduced computational overhead. Overall, HOMER provides an effective solution for extending the context limit of LLMs while maintaining efficiency.