PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

19 Jan 2017 | Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma
PixelCNN++ is an improved version of the PixelCNN model, which is a powerful generative model with a tractable likelihood. The authors implemented PixelCNN and made several modifications to simplify the model and improve its performance. These modifications include using a discretized logistic mixture likelihood instead of a 256-way softmax, conditioning on whole pixels rather than sub-pixels, using downsampling to efficiently capture structure at multiple resolutions, adding additional short-cut connections to speed up optimization, and regularizing the model using dropout. The authors also present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications. The PixelCNN model was introduced by van den Oord et al. (2016b) and is a generative model of images with a tractable likelihood. The model fully factorizes the probability density function on an image x over all its sub-pixels as p(x) = ∏i p(xi | x<i). The conditional distributions p(xi | x<i) are parameterized by convolutional neural networks and all share parameters. The PixelCNN is a powerful model as the functional form of these conditionals is very flexible. In addition, it is computationally efficient as all conditionals can be evaluated in parallel on a GPU for an observed image x. Thanks to these properties, the PixelCNN represents the current state-of-the-art in generative modeling when evaluated in terms of log-likelihood. The authors developed their own implementation of PixelCNN and made several modifications to the base model to simplify its structure and improve its performance. They now release their implementation at https://github.com/openai/pixel-cnn, hoping that it will be useful to the broader community. Their modifications are discussed in Section 2, and evaluated experimentally in Section 3. State-of-the-art log-likelihood results confirm their usefulness. The most important modifications to the PixelCNN model architecture include using a discretized logistic mixture likelihood, conditioning on whole pixels, using downsampling, adding short-cut connections, and regularizing the model using dropout. The authors also present experiments on CIFAR-10, showing that their model achieves state-of-the-art results in terms of log-likelihood and generates images with coherent global structure. They also show that their model can be conditioned on class labels and that it performs well in terms of log-likelihood. The authors also examine the effect of network depth and field of view size on the performance of the model. They find that a PixelCNN with a small receptive field can achieve competitive generative modeling performance on CIFAR-10 as long as it has enough capacity. They also perform ablation experiments to test the effect of their modifications to PixelCNN. They find that the discretized logistic mixture likelihood performs better than the softmax likelihood, and that the short-cut connections are important for training the model. They also find that dropout is important for regularizing the model and preventing overfitting. The authors concludePixelCNN++ is an improved version of the PixelCNN model, which is a powerful generative model with a tractable likelihood. The authors implemented PixelCNN and made several modifications to simplify the model and improve its performance. These modifications include using a discretized logistic mixture likelihood instead of a 256-way softmax, conditioning on whole pixels rather than sub-pixels, using downsampling to efficiently capture structure at multiple resolutions, adding additional short-cut connections to speed up optimization, and regularizing the model using dropout. The authors also present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications. The PixelCNN model was introduced by van den Oord et al. (2016b) and is a generative model of images with a tractable likelihood. The model fully factorizes the probability density function on an image x over all its sub-pixels as p(x) = ∏i p(xi | x<i). The conditional distributions p(xi | x<i) are parameterized by convolutional neural networks and all share parameters. The PixelCNN is a powerful model as the functional form of these conditionals is very flexible. In addition, it is computationally efficient as all conditionals can be evaluated in parallel on a GPU for an observed image x. Thanks to these properties, the PixelCNN represents the current state-of-the-art in generative modeling when evaluated in terms of log-likelihood. The authors developed their own implementation of PixelCNN and made several modifications to the base model to simplify its structure and improve its performance. They now release their implementation at https://github.com/openai/pixel-cnn, hoping that it will be useful to the broader community. Their modifications are discussed in Section 2, and evaluated experimentally in Section 3. State-of-the-art log-likelihood results confirm their usefulness. The most important modifications to the PixelCNN model architecture include using a discretized logistic mixture likelihood, conditioning on whole pixels, using downsampling, adding short-cut connections, and regularizing the model using dropout. The authors also present experiments on CIFAR-10, showing that their model achieves state-of-the-art results in terms of log-likelihood and generates images with coherent global structure. They also show that their model can be conditioned on class labels and that it performs well in terms of log-likelihood. The authors also examine the effect of network depth and field of view size on the performance of the model. They find that a PixelCNN with a small receptive field can achieve competitive generative modeling performance on CIFAR-10 as long as it has enough capacity. They also perform ablation experiments to test the effect of their modifications to PixelCNN. They find that the discretized logistic mixture likelihood performs better than the softmax likelihood, and that the short-cut connections are important for training the model. They also find that dropout is important for regularizing the model and preventing overfitting. The authors conclude
Reach us at info@study.space