2024-07-03 | Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
The paper "Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization" addresses the issue of large language models (LLMs) struggling to capture relevant information located in the middle of their input, known as the "lost-in-the-middle" problem. The authors identify three main contributions:
1. **Understanding the Problem**: They establish a connection between the lost-in-the-middle phenomenon and LLMs' intrinsic attention bias, which exhibits a U-shaped pattern where tokens at the beginning and end of the input receive higher attention, regardless of their relevance.
2. **Calibration Mechanism**: They propose a calibration mechanism called "found-in-the-middle" to mitigate this positional bias. This mechanism disentangles the positional bias from the model's attention, allowing the model to attend to relevant contexts based on their relevance, regardless of their position.
3. **Performance Improvement**: The authors demonstrate that the found-in-the-middle mechanism not only improves the model's ability to locate relevant information within long contexts but also enhances retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points.
The paper includes experimental setups and results to validate the effectiveness of the proposed calibration mechanism, showing that it can improve RAG performance by up to 15 percentage points on the NaturalQuestion dataset. The findings suggest that LLMs can indeed capture relevant information from the middle of long inputs, but their performance is hindered by the positional bias. The work opens up future directions for understanding and addressing LLM attention biases.The paper "Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization" addresses the issue of large language models (LLMs) struggling to capture relevant information located in the middle of their input, known as the "lost-in-the-middle" problem. The authors identify three main contributions:
1. **Understanding the Problem**: They establish a connection between the lost-in-the-middle phenomenon and LLMs' intrinsic attention bias, which exhibits a U-shaped pattern where tokens at the beginning and end of the input receive higher attention, regardless of their relevance.
2. **Calibration Mechanism**: They propose a calibration mechanism called "found-in-the-middle" to mitigate this positional bias. This mechanism disentangles the positional bias from the model's attention, allowing the model to attend to relevant contexts based on their relevance, regardless of their position.
3. **Performance Improvement**: The authors demonstrate that the found-in-the-middle mechanism not only improves the model's ability to locate relevant information within long contexts but also enhances retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points.
The paper includes experimental setups and results to validate the effectiveness of the proposed calibration mechanism, showing that it can improve RAG performance by up to 15 percentage points on the NaturalQuestion dataset. The findings suggest that LLMs can indeed capture relevant information from the middle of long inputs, but their performance is hindered by the positional bias. The work opens up future directions for understanding and addressing LLM attention biases.