Simple and Effective Masked Diffusion Language Models

Simple and Effective Masked Diffusion Language Models

11 Jun 2024 | Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov
This paper introduces a simple and effective masked diffusion language model (MDLM) that outperforms previous diffusion and autoregressive (AR) models in language modeling tasks. MDLM is trained using a simplified objective that is a weighted average of masked language modeling (MLM) losses, enabling efficient training of encoder-only models with semi-autoregressive generation capabilities. The model uses a substitution-based parameterization (SUBS) of the reverse diffusion process, which allows for a Rao-Blackwellized objective that improves the tightness and variance of the ELBO. This approach leads to significant improvements in performance on language modeling benchmarks, with MDLM achieving state-of-the-art results that approach the perplexity of AR models. The model is also extended to non-language domains, including biological sequence modeling, where it achieves comparable or better performance than classical BERT-style training. The paper also presents efficient samplers that support semi-autoregressive generation and demonstrates that simple engineering choices significantly improve performance. The code for MDLM is available at https://github.com/kuleshov-group/mdlm.This paper introduces a simple and effective masked diffusion language model (MDLM) that outperforms previous diffusion and autoregressive (AR) models in language modeling tasks. MDLM is trained using a simplified objective that is a weighted average of masked language modeling (MLM) losses, enabling efficient training of encoder-only models with semi-autoregressive generation capabilities. The model uses a substitution-based parameterization (SUBS) of the reverse diffusion process, which allows for a Rao-Blackwellized objective that improves the tightness and variance of the ELBO. This approach leads to significant improvements in performance on language modeling benchmarks, with MDLM achieving state-of-the-art results that approach the perplexity of AR models. The model is also extended to non-language domains, including biological sequence modeling, where it achieves comparable or better performance than classical BERT-style training. The paper also presents efficient samplers that support semi-autoregressive generation and demonstrates that simple engineering choices significantly improve performance. The code for MDLM is available at https://github.com/kuleshov-group/mdlm.
Reach us at info@study.space