This paper introduces Reparameterized Absorbing Discrete Diffusion (RADD), a novel diffusion model that characterizes time-independent conditional probabilities. The key contribution is the revelation that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data multiplied by a time-dependent scalar. This insight leads to the development of RADD, which simplifies the model architecture and reduces the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged. RADD achieves state-of-the-art performance on five zero-shot language modeling benchmarks at the GPT-2 scale, demonstrating superior efficiency and effectiveness compared to existing models. The paper also unifies absorbing discrete diffusion and any-order autoregressive models (AO-ARMs), showing that their training objectives are equivalent. This unification provides a fresh perspective on the negative log-likelihood of absorbing discrete diffusion and offers alternative objective functions for training and likelihood evaluation.This paper introduces Reparameterized Absorbing Discrete Diffusion (RADD), a novel diffusion model that characterizes time-independent conditional probabilities. The key contribution is the revelation that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data multiplied by a time-dependent scalar. This insight leads to the development of RADD, which simplifies the model architecture and reduces the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged. RADD achieves state-of-the-art performance on five zero-shot language modeling benchmarks at the GPT-2 scale, demonstrating superior efficiency and effectiveness compared to existing models. The paper also unifies absorbing discrete diffusion and any-order autoregressive models (AO-ARMs), showing that their training objectives are equivalent. This unification provides a fresh perspective on the negative log-likelihood of absorbing discrete diffusion and offers alternative objective functions for training and likelihood evaluation.