Pixel Recurrent Neural Networks

Pixel Recurrent Neural Networks

19 Aug 2016 | Aäron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu
This paper presents a deep neural network, the Pixel Recurrent Neural Network (PixelRNN), designed to sequentially predict pixels in an image along both spatial dimensions. The PixelRNN models the discrete probability of raw pixel values and encodes the complete set of dependencies in the image. Key architectural innovations include fast two-dimensional recurrent layers and the effective use of residual connections in deep recurrent networks. The authors achieve significantly better log-likelihood scores on natural images compared to previous state-of-the-art methods and provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied, and globally coherent. The PixelRNNs are trained using a discrete softmax distribution and are evaluated on datasets such as MNIST, CIFAR-10, and ImageNet, demonstrating their ability to capture both local and long-range spatial dependencies. The paper also introduces the Multi-Scale PixelRNN, which combines an unconditional PixelRNN with one or more conditional PixelRNNs to generate larger images. Overall, the PixelRNNs show significant improvements over previous models in terms of log-likelihood and sample quality.This paper presents a deep neural network, the Pixel Recurrent Neural Network (PixelRNN), designed to sequentially predict pixels in an image along both spatial dimensions. The PixelRNN models the discrete probability of raw pixel values and encodes the complete set of dependencies in the image. Key architectural innovations include fast two-dimensional recurrent layers and the effective use of residual connections in deep recurrent networks. The authors achieve significantly better log-likelihood scores on natural images compared to previous state-of-the-art methods and provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied, and globally coherent. The PixelRNNs are trained using a discrete softmax distribution and are evaluated on datasets such as MNIST, CIFAR-10, and ImageNet, demonstrating their ability to capture both local and long-range spatial dependencies. The paper also introduces the Multi-Scale PixelRNN, which combines an unconditional PixelRNN with one or more conditional PixelRNNs to generate larger images. Overall, the PixelRNNs show significant improvements over previous models in terms of log-likelihood and sample quality.
Reach us at info@study.space