Understanding Simplified and Generalized Masked Diffusion for Discrete Data

This paper addresses the limitations of existing masked diffusion models for discrete data, which often suffer from complex formulations and unclear relationships between different perspectives, leading to suboptimal parameterization and training objectives. The authors propose a simplified and generalized framework for masked diffusion models, demonstrating that the continuous-time variational objective can be expressed as a weighted integral of cross-entropy losses. This framework enables the training of generalized masked diffusion models with state-dependent masking schedules. The proposed models outperform previous diffusion language models on GPT-2 scale and achieve superior performance on zero-shot language modeling tasks. Additionally, the models significantly outperform previous discrete diffusion models on pixel-level image modeling, achieving comparable or better results than autoregressive models of similar sizes. The paper also discusses the connection between the proposed framework and existing continuous-time Markov chain (CTMC) views, and provides a unified understanding of various existing masked diffusion models.This paper addresses the limitations of existing masked diffusion models for discrete data, which often suffer from complex formulations and unclear relationships between different perspectives, leading to suboptimal parameterization and training objectives. The authors propose a simplified and generalized framework for masked diffusion models, demonstrating that the continuous-time variational objective can be expressed as a weighted integral of cross-entropy losses. This framework enables the training of generalized masked diffusion models with state-dependent masking schedules. The proposed models outperform previous diffusion language models on GPT-2 scale and achieve superior performance on zero-shot language modeling tasks. Additionally, the models significantly outperform previous discrete diffusion models on pixel-level image modeling, achieving comparable or better results than autoregressive models of similar sizes. The paper also discusses the connection between the proposed framework and existing continuous-time Markov chain (CTMC) views, and provides a unified understanding of various existing masked diffusion models.

Simplified and Generalized Masked Diffusion for Discrete Data

6 Jun 2024 | Jiaxin Shi*, Kehang Han*, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

6 Jun 2024 | Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias