Understanding Make Your LLM Fully Utilize the Context

The paper addresses the challenge of long-context utilization in large language models (LLMs), known as the "lost-in-the-middle" problem. This issue arises because LLMs struggle to effectively use information from the middle of long contexts, despite being able to understand information at the beginning and end. The authors propose Information-Intensive (IN2) training, a data-driven solution that aims to teach models that crucial information can be found throughout the context, not just at the beginning and end. IN2 training involves generating a synthesized long-context question-answer dataset where answers require fine-grained information awareness on short segments and the integration of information from multiple segments. The authors apply this training to the Mistral-7B model, resulting in FILM-7B (FILL-in-the-Middle). To evaluate FILM-7B's performance, they design three probing tasks covering various context styles and information retrieval patterns. The results show that FILM-7B significantly improves long-context information awareness and outperforms or matches the performance of proprietary LLMs like GPT-4-Turbo on real-world long-context tasks while maintaining comparable performance on short-context tasks. The paper also discusses the effectiveness of sliding windows and RoPE base adjustments in training strategies.The paper addresses the challenge of long-context utilization in large language models (LLMs), known as the "lost-in-the-middle" problem. This issue arises because LLMs struggle to effectively use information from the middle of long contexts, despite being able to understand information at the beginning and end. The authors propose Information-Intensive (IN2) training, a data-driven solution that aims to teach models that crucial information can be found throughout the context, not just at the beginning and end. IN2 training involves generating a synthesized long-context question-answer dataset where answers require fine-grained information awareness on short segments and the integration of information from multiple segments. The authors apply this training to the Mistral-7B model, resulting in FILM-7B (FILL-in-the-Middle). To evaluate FILM-7B's performance, they design three probing tasks covering various context styles and information retrieval patterns. The results show that FILM-7B significantly improves long-context information awareness and outperforms or matches the performance of proprietary LLMs like GPT-4-Turbo on real-world long-context tasks while maintaining comparable performance on short-context tasks. The paper also discusses the effectiveness of sliding windows and RoPE base adjustments in training strategies.

Make Your LLM Fully Utilize the Context

26 Apr 2024 | Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou