10 Feb 2016 | Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther
The paper presents an autoencoder that leverages learned representations to better measure similarities in data space, particularly for images. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN), the method uses the GAN discriminator to measure sample similarity, replacing element-wise errors with feature-wise errors. This approach captures the data distribution more effectively while offering invariance to transformations like translation. The method is applied to face images, demonstrating superior visual fidelity compared to VAEs with element-wise similarity measures. Additionally, the method learns an embedding where high-level abstract visual features can be modified using simple arithmetic, as shown in experiments with face images labeled with visual attribute vectors. The contributions include an unsupervised generative model that learns to encode, generate, and compare dataset samples, and the demonstration of disentangled factors of variation in the latent space.The paper presents an autoencoder that leverages learned representations to better measure similarities in data space, particularly for images. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN), the method uses the GAN discriminator to measure sample similarity, replacing element-wise errors with feature-wise errors. This approach captures the data distribution more effectively while offering invariance to transformations like translation. The method is applied to face images, demonstrating superior visual fidelity compared to VAEs with element-wise similarity measures. Additionally, the method learns an embedding where high-level abstract visual features can be modified using simple arithmetic, as shown in experiments with face images labeled with visual attribute vectors. The contributions include an unsupervised generative model that learns to encode, generate, and compare dataset samples, and the demonstration of disentangled factors of variation in the latent space.