NVAE: A Deep Hierarchical Variational Autoencoder

NVAE: A Deep Hierarchical Variational Autoencoder

8 Jan 2021 | Arash Vahdat, Jan Kautz
NVAE: A Deep Hierarchical Variational Autoencoder NVAE is a deep hierarchical variational autoencoder (VAE) designed for image generation, utilizing depthwise separable convolutions and batch normalization. It introduces a residual parameterization of Normal distributions and employs spectral regularization to stabilize training. NVAE achieves state-of-the-art results on datasets such as MNIST, CIFAR-10, CelebA 64, CelebA HQ, and FFHQ, outperforming previous non-autoregressive models and reducing the gap with autoregressive models. It produces high-quality images and is the first successful VAE applied to natural images as large as 256×256 pixels. NVAE's design focuses on addressing challenges in neural architecture for VAEs, including modeling long-range correlations, handling hierarchical structures, and ensuring training stability. Key contributions include a novel deep hierarchical VAE with depthwise convolutions, a new residual parameterization of approximate posteriors, spectral regularization for training stability, and practical solutions to reduce memory burden. NVAE demonstrates that carefully designed neural architectures can achieve state-of-the-art results in image generation. The source code is available at https://github.com/NVlabs/NVAE.NVAE: A Deep Hierarchical Variational Autoencoder NVAE is a deep hierarchical variational autoencoder (VAE) designed for image generation, utilizing depthwise separable convolutions and batch normalization. It introduces a residual parameterization of Normal distributions and employs spectral regularization to stabilize training. NVAE achieves state-of-the-art results on datasets such as MNIST, CIFAR-10, CelebA 64, CelebA HQ, and FFHQ, outperforming previous non-autoregressive models and reducing the gap with autoregressive models. It produces high-quality images and is the first successful VAE applied to natural images as large as 256×256 pixels. NVAE's design focuses on addressing challenges in neural architecture for VAEs, including modeling long-range correlations, handling hierarchical structures, and ensuring training stability. Key contributions include a novel deep hierarchical VAE with depthwise convolutions, a new residual parameterization of approximate posteriors, spectral regularization for training stability, and practical solutions to reduce memory burden. NVAE demonstrates that carefully designed neural architectures can achieve state-of-the-art results in image generation. The source code is available at https://github.com/NVlabs/NVAE.
Reach us at info@study.space
[slides] NVAE%3A A Deep Hierarchical Variational Autoencoder | StudySpace