Understanding How to think step-by-step%3A A mechanistic understanding of chain-of-thought reasoning

This paper investigates the neural mechanisms underlying Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs), specifically focusing on the Llama-2 7B model. The authors explore how LLMs generate step-by-step answers to multi-step reasoning problems using fictional ontologies. Key findings include: 1. **Parallel Pathways of Answer Generation**: LLMs use multiple parallel pathways to generate answers, with attention heads writing the answer token in the later stages of the model. 2. **Functional Rift in Middle Layers**: There is a functional shift in the middle layers of the model, where attention heads that move information along ontological relationships appear in the initial half, while those that write the answer token appear in the later half. 3. **Information Mixing**: Attention heads perform information mixing between ontologically related tokens, with this process starting from the first layer and peaking around the 16th decoder block. 4. **Answer Collection Sources**: Attention heads collect answer tokens from the generated context, question context, and few-shot context, indicating that LLMs use multiple sources of information to generate answers. 5. **Context Abidance**: The model starts focusing on contextual information at deeper layers, with a visible correlation between the depth of attention heads and their adherence to context. These findings provide a mechanistic understanding of how LLMs perform CoT reasoning, highlighting the internal mechanisms and functional components involved in this process.This paper investigates the neural mechanisms underlying Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs), specifically focusing on the Llama-2 7B model. The authors explore how LLMs generate step-by-step answers to multi-step reasoning problems using fictional ontologies. Key findings include: 1. **Parallel Pathways of Answer Generation**: LLMs use multiple parallel pathways to generate answers, with attention heads writing the answer token in the later stages of the model. 2. **Functional Rift in Middle Layers**: There is a functional shift in the middle layers of the model, where attention heads that move information along ontological relationships appear in the initial half, while those that write the answer token appear in the later half. 3. **Information Mixing**: Attention heads perform information mixing between ontologically related tokens, with this process starting from the first layer and peaking around the 16th decoder block. 4. **Answer Collection Sources**: Attention heads collect answer tokens from the generated context, question context, and few-shot context, indicating that LLMs use multiple sources of information to generate answers. 5. **Context Abidance**: The model starts focusing on contextual information at deeper layers, with a visible correlation between the depth of attention heads and their adherence to context. These findings provide a mechanistic understanding of how LLMs perform CoT reasoning, highlighting the internal mechanisms and functional components involved in this process.

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

6 May 2024 | Subhabrata Dutta, Joykirat Singh, Soumen Chakrabarti, Tanmoy Chakraborty