Neural Discrete Representation Learning

Neural Discrete Representation Learning

30 May 2018 | Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
The paper introduces the Vector Quantised-Variational AutoEncoder (VQ-VAE), a novel generative model that learns discrete representations. Unlike traditional VAEs, VQ-VAEs use discrete latent variables and a learned prior, addressing issues like "posterior collapse" common in VAEs with continuous latent variables. The model combines vector quantisation (VQ) with variational autoencoders to avoid the problems associated with continuous latent representations, such as high variance and posterior collapse. The VQ-VAE is trained using a combination of reconstruction loss, VQ loss, and a commitment loss to ensure the encoder commits to discrete representations. The model is evaluated on various tasks, including image generation, speech processing, and video generation, demonstrating its effectiveness in learning high-quality representations and performing tasks like speaker conversion and unsupervised phoneme learning. The VQ-VAE achieves performance comparable to continuous VAEs in log-likelihood while offering the benefits of discrete distributions.The paper introduces the Vector Quantised-Variational AutoEncoder (VQ-VAE), a novel generative model that learns discrete representations. Unlike traditional VAEs, VQ-VAEs use discrete latent variables and a learned prior, addressing issues like "posterior collapse" common in VAEs with continuous latent variables. The model combines vector quantisation (VQ) with variational autoencoders to avoid the problems associated with continuous latent representations, such as high variance and posterior collapse. The VQ-VAE is trained using a combination of reconstruction loss, VQ loss, and a commitment loss to ensure the encoder commits to discrete representations. The model is evaluated on various tasks, including image generation, speech processing, and video generation, demonstrating its effectiveness in learning high-quality representations and performing tasks like speaker conversion and unsupervised phoneme learning. The VQ-VAE achieves performance comparable to continuous VAEs in log-likelihood while offering the benefits of discrete distributions.
Reach us at info@study.space
[slides] Neural Discrete Representation Learning | StudySpace