This paper investigates how large language models (LLMs) with chain-of-thought (CoT) reasoning mimic human reasoning. The study compares the reasoning process of LLMs with humans using causal analysis to understand the relationships between problem instructions, reasoning steps, and answers. The research reveals that LLMs often deviate from the ideal causal chain, leading to spurious correlations and potential consistency errors between reasoning steps and answers. The study also examines factors influencing the causal structure, finding that in-context learning strengthens it, while post-training techniques like supervised fine-tuning and reinforcement learning on human feedback weaken it. The results show that increasing model size alone does not necessarily improve the causal structure, suggesting the need for new techniques to enhance LLM reasoning. The study highlights the importance of understanding the underlying causal mechanisms in LLMs to improve their reasoning capabilities and achieve human-level reasoning. The findings indicate that LLMs may not always produce faithful explanations, and their reasoning processes can be inconsistent with the true reasoning behind the answers. The study also explores the impact of various techniques on causal structures, showing that in-context learning strengthens them, while supervised fine-tuning and reinforcement learning on human feedback weaken them. The research underscores the need for further investigation into effective techniques to strengthen causal structures in LLMs and achieve human-level reasoning.This paper investigates how large language models (LLMs) with chain-of-thought (CoT) reasoning mimic human reasoning. The study compares the reasoning process of LLMs with humans using causal analysis to understand the relationships between problem instructions, reasoning steps, and answers. The research reveals that LLMs often deviate from the ideal causal chain, leading to spurious correlations and potential consistency errors between reasoning steps and answers. The study also examines factors influencing the causal structure, finding that in-context learning strengthens it, while post-training techniques like supervised fine-tuning and reinforcement learning on human feedback weaken it. The results show that increasing model size alone does not necessarily improve the causal structure, suggesting the need for new techniques to enhance LLM reasoning. The study highlights the importance of understanding the underlying causal mechanisms in LLMs to improve their reasoning capabilities and achieve human-level reasoning. The findings indicate that LLMs may not always produce faithful explanations, and their reasoning processes can be inconsistent with the true reasoning behind the answers. The study also explores the impact of various techniques on causal structures, showing that in-context learning strengthens them, while supervised fine-tuning and reinforcement learning on human feedback weaken them. The research underscores the need for further investigation into effective techniques to strengthen causal structures in LLMs and achieve human-level reasoning.