18 Jun 2016 | Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu
This paper explores conditional image generation using a new image density model based on the PixelCNN architecture. The model can be conditioned on various vectors, including descriptive labels, tags, or latent embeddings from other networks. When conditioned on class labels from the ImageNet database, the model generates diverse and realistic scenes representing different animals, objects, landscapes, and structures. When conditioned on an embedding produced by a convolutional network from a single image of an unseen face, it generates a variety of new portraits with different facial expressions, poses, and lighting conditions. The paper also demonstrates that conditional PixelCNN can serve as a powerful decoder in image autoencoders, improving log-likelihood performance to match the state-of-the-art of PixelRNN on ImageNet while reducing computational cost. The authors introduce a gated variant of PixelCNN (Gated PixelCNN) that combines the strengths of PixelCNN and PixelRNN, achieving similar log-likelihood performance with less training time. Additionally, they explore the conditional PixelCNN for generating images from diverse classes and using high-level image descriptions, showing its ability to capture invariances and generate diverse samples. The paper concludes by discussing future directions, including generating new images from a single example and combining Conditional PixelCNN with variational inference.This paper explores conditional image generation using a new image density model based on the PixelCNN architecture. The model can be conditioned on various vectors, including descriptive labels, tags, or latent embeddings from other networks. When conditioned on class labels from the ImageNet database, the model generates diverse and realistic scenes representing different animals, objects, landscapes, and structures. When conditioned on an embedding produced by a convolutional network from a single image of an unseen face, it generates a variety of new portraits with different facial expressions, poses, and lighting conditions. The paper also demonstrates that conditional PixelCNN can serve as a powerful decoder in image autoencoders, improving log-likelihood performance to match the state-of-the-art of PixelRNN on ImageNet while reducing computational cost. The authors introduce a gated variant of PixelCNN (Gated PixelCNN) that combines the strengths of PixelCNN and PixelRNN, achieving similar log-likelihood performance with less training time. Additionally, they explore the conditional PixelCNN for generating images from diverse classes and using high-level image descriptions, showing its ability to capture invariances and generate diverse samples. The paper concludes by discussing future directions, including generating new images from a single example and combining Conditional PixelCNN with variational inference.