[slides and audio] Hopping Too Late%3A Exploring the Limitations of Large Language Models on Multi-Hop Queries

The paper "Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries" by Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson investigates the internal mechanisms of large language models (LLMs) in handling multi-hop queries. Multi-hop queries, such as "The spouse of the performer of Imagine is," require two information extraction steps: resolving the first hop ("the performer of Imagine") to the bridge entity (John Lennon) and then resolving the second hop ("the spouse of John Lennon") to the target entity (Yoko Ono). The authors use Patchscopes, a method for inspecting hidden representations, to analyze how LLMs perform these steps. Key findings include: 1. **Latent Reasoning Pathway**: The first hop is resolved in the early layers of the model, while the second hop is resolved in the later layers. 2. **Information Propagation**: Critical information from the first hop must propagate to the last token to resolve the second hop. 3. **Back-Patching Method**: A novel method called back-patching is proposed, which involves patching a hidden representation from a later layer back into an earlier layer. This method can correct up to 57% of incorrect cases, indicating that later layers may lack the necessary functionality. The paper contributes to understanding and improving latent reasoning in LLMs, particularly in multi-hop question answering. It provides a dataset of 82,020 two-hop queries and analyzes LLaMA models of various sizes. The results highlight the sequential nature of the computation and the importance of early resolution in the first hop. The back-patching method offers a practical way to improve performance on multi-hop queries.The paper "Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries" by Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson investigates the internal mechanisms of large language models (LLMs) in handling multi-hop queries. Multi-hop queries, such as "The spouse of the performer of Imagine is," require two information extraction steps: resolving the first hop ("the performer of Imagine") to the bridge entity (John Lennon) and then resolving the second hop ("the spouse of John Lennon") to the target entity (Yoko Ono). The authors use Patchscopes, a method for inspecting hidden representations, to analyze how LLMs perform these steps. Key findings include: 1. **Latent Reasoning Pathway**: The first hop is resolved in the early layers of the model, while the second hop is resolved in the later layers. 2. **Information Propagation**: Critical information from the first hop must propagate to the last token to resolve the second hop. 3. **Back-Patching Method**: A novel method called back-patching is proposed, which involves patching a hidden representation from a later layer back into an earlier layer. This method can correct up to 57% of incorrect cases, indicating that later layers may lack the necessary functionality. The paper contributes to understanding and improving latent reasoning in LLMs, particularly in multi-hop question answering. It provides a dataset of 82,020 two-hop queries and analyzes LLaMA models of various sizes. The results highlight the sequential nature of the computation and the importance of early resolution in the first hop. The back-patching method offers a practical way to improve performance on multi-hop queries.

Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

18 Jun 2024 | Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson