How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

6 May 2024 | Subhabrata Dutta, Joykirat Singh, Soumen Chakrabarti, Tanmoy Chakraborty
This paper investigates the internal mechanisms of Large Language Models (LLMs) that enable Chain-of-Thought (CoT) reasoning. The study focuses on Llama-2 7B and uses the PrOntoQA dataset, which involves fictional ontologies, to analyze how LLMs perform multi-step reasoning. The research reveals that LLMs use multiple parallel pathways for step-by-step reasoning, with different functional components in the model handling different aspects of the reasoning process. The initial layers of the model are biased towards pretraining prior, while the later layers are influenced by the in-context prior. Attention heads in the initial layers handle information transfer between ontologically related entities, while those in the later layers are responsible for writing answer tokens. The study also finds that the model has a functional rift in the middle layers, with the 16th decoder block marking a phase shift in the residual stream content and attention head functionality. The model uses multiple pathways to generate answers, drawing from the generated context, question context, and few-shot context. The findings suggest that LLMs use the context generated via CoT when answering questions, and that different subtasks rely on different pathways for answer generation. The study provides empirical evidence that LLMs use parallel pathways for answer generation, with different attention heads contributing to the process. The research highlights the complexity of LLMs and the need for further investigation into their internal mechanisms.This paper investigates the internal mechanisms of Large Language Models (LLMs) that enable Chain-of-Thought (CoT) reasoning. The study focuses on Llama-2 7B and uses the PrOntoQA dataset, which involves fictional ontologies, to analyze how LLMs perform multi-step reasoning. The research reveals that LLMs use multiple parallel pathways for step-by-step reasoning, with different functional components in the model handling different aspects of the reasoning process. The initial layers of the model are biased towards pretraining prior, while the later layers are influenced by the in-context prior. Attention heads in the initial layers handle information transfer between ontologically related entities, while those in the later layers are responsible for writing answer tokens. The study also finds that the model has a functional rift in the middle layers, with the 16th decoder block marking a phase shift in the residual stream content and attention head functionality. The model uses multiple pathways to generate answers, drawing from the generated context, question context, and few-shot context. The findings suggest that LLMs use the context generated via CoT when answering questions, and that different subtasks rely on different pathways for answer generation. The study provides empirical evidence that LLMs use parallel pathways for answer generation, with different attention heads contributing to the process. The research highlights the complexity of LLMs and the need for further investigation into their internal mechanisms.
Reach us at info@study.space
[slides and audio] How to think step-by-step%3A A mechanistic understanding of chain-of-thought reasoning