Understanding Deconstructing Denoising Diffusion Models for Self-Supervised Learning

This study examines the representation learning capabilities of Denoising Diffusion Models (DDMs), which were originally designed for image generation. The authors deconstruct a DDM step-by-step, transforming it into a classical Denoising Autoencoder (DAE) to understand how various components influence self-supervised representation learning. They find that only a few modern components are critical for learning good representations, while many others are nonessential. The study ultimately arrives at a highly simplified architecture that resembles a classical DAE, with the main critical component being a tokenizer that creates a low-dimensional latent space. The authors also discover that the denoising-driven process is more important for learning good representations than the diffusion-driven process. The proposed "Latent Denoising Autoencoder" (l-DAE) achieves competitive self-supervised learning performance, outperforming off-the-shelf DDMs and falling short of contrastive learning methods but reducing the gap significantly. The study suggests that there is room for further research in the direction of DAE and DDMs.This study examines the representation learning capabilities of Denoising Diffusion Models (DDMs), which were originally designed for image generation. The authors deconstruct a DDM step-by-step, transforming it into a classical Denoising Autoencoder (DAE) to understand how various components influence self-supervised representation learning. They find that only a few modern components are critical for learning good representations, while many others are nonessential. The study ultimately arrives at a highly simplified architecture that resembles a classical DAE, with the main critical component being a tokenizer that creates a low-dimensional latent space. The authors also discover that the denoising-driven process is more important for learning good representations than the diffusion-driven process. The proposed "Latent Denoising Autoencoder" (l-DAE) achieves competitive self-supervised learning performance, outperforming off-the-shelf DDMs and falling short of contrastive learning methods but reducing the gap significantly. The study suggests that there is room for further research in the direction of DAE and DDMs.

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

25 Jan 2024 | Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He