This paper introduces Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought (CoT) reasoning to enhance the reasoning ability of diffusion language models. Unlike autoregressive models that process tokens sequentially, DoT allows reasoning steps to diffuse over time through a diffusion model, offering greater flexibility in trading off computation for reasoning performance. The method is designed to generate reasoning paths alongside diffusion timesteps, enabling more efficient and accurate reasoning. DoT also incorporates self-correction mechanisms and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Experimental results show that DoT outperforms autoregressive models in both efficiency and accuracy on tasks such as multi-digit multiplication, boolean logic, and grade school math problems. The paper also explores the trade-off between reasoning time and performance, demonstrating that DoT can achieve significant speed-ups while maintaining high accuracy. Additionally, DoT exhibits strong self-correction capabilities, allowing it to recover from errors in previous reasoning steps. The study highlights the potential of diffusion language models in complex reasoning tasks and suggests that further research is needed to improve their performance and scalability.This paper introduces Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought (CoT) reasoning to enhance the reasoning ability of diffusion language models. Unlike autoregressive models that process tokens sequentially, DoT allows reasoning steps to diffuse over time through a diffusion model, offering greater flexibility in trading off computation for reasoning performance. The method is designed to generate reasoning paths alongside diffusion timesteps, enabling more efficient and accurate reasoning. DoT also incorporates self-correction mechanisms and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Experimental results show that DoT outperforms autoregressive models in both efficiency and accuracy on tasks such as multi-digit multiplication, boolean logic, and grade school math problems. The paper also explores the trade-off between reasoning time and performance, demonstrating that DoT can achieve significant speed-ups while maintaining high accuracy. Additionally, DoT exhibits strong self-correction capabilities, allowing it to recover from errors in previous reasoning steps. The study highlights the potential of diffusion language models in complex reasoning tasks and suggests that further research is needed to improve their performance and scalability.