Autoencoding beyond pixels using a learned similarity metric

Autoencoding beyond pixels using a learned similarity metric

10 Feb 2016 | Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther
This paper introduces a novel autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN), the method uses the GAN discriminator to measure sample similarity, replacing element-wise errors with feature-wise errors. This approach better captures data distribution and is invariant to transformations like translation. The method is applied to face images, outperforming VAEs with element-wise similarity measures in terms of visual fidelity. It also learns an embedding where high-level visual features can be modified using simple arithmetic. The paper discusses the limitations of element-wise similarity metrics for complex data distributions and proposes a hybrid VAE/GAN model that uses the GAN discriminator to measure similarity. This model combines the advantages of VAEs and GANs, using a learned similarity measure for reconstruction. The model is trained with a triple criterion that includes the prior, the similarity measure, and the GAN objective. Experiments on the CelebA dataset show that the VAE/GAN model produces sharper and more natural images compared to plain VAEs and GANs. The model also demonstrates the ability to disentangle factors of variation in the input data distribution and discover visual attributes in the latent space. The method is shown to be effective in generating images with specific visual attributes, and it outperforms other models in attribute similarity tasks. The paper concludes that learned similarity measures are a promising step towards scaling up generative models to more complex data distributions. The method is an extension of the VAE framework, combining the strengths of both VAEs and GANs to produce high-quality images. The results show that the model can generate images with unprecedented visual fidelity and that the learned similarity measure enables the discovery of high-level visual features in the latent space.This paper introduces a novel autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN), the method uses the GAN discriminator to measure sample similarity, replacing element-wise errors with feature-wise errors. This approach better captures data distribution and is invariant to transformations like translation. The method is applied to face images, outperforming VAEs with element-wise similarity measures in terms of visual fidelity. It also learns an embedding where high-level visual features can be modified using simple arithmetic. The paper discusses the limitations of element-wise similarity metrics for complex data distributions and proposes a hybrid VAE/GAN model that uses the GAN discriminator to measure similarity. This model combines the advantages of VAEs and GANs, using a learned similarity measure for reconstruction. The model is trained with a triple criterion that includes the prior, the similarity measure, and the GAN objective. Experiments on the CelebA dataset show that the VAE/GAN model produces sharper and more natural images compared to plain VAEs and GANs. The model also demonstrates the ability to disentangle factors of variation in the input data distribution and discover visual attributes in the latent space. The method is shown to be effective in generating images with specific visual attributes, and it outperforms other models in attribute similarity tasks. The paper concludes that learned similarity measures are a promising step towards scaling up generative models to more complex data distributions. The method is an extension of the VAE framework, combining the strengths of both VAEs and GANs to produce high-quality images. The results show that the model can generate images with unprecedented visual fidelity and that the learned similarity measure enables the discovery of high-level visual features in the latent space.
Reach us at info@study.space