EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis

30 Jul 2017 | Mehdi S. M. Sajjadi, Bernhard Schölkopf, Michael Hirsch
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis This paper proposes a novel approach for single image super-resolution (SISR) that combines automated texture synthesis with a perceptual loss to generate realistic textures rather than optimizing for pixel-accurate reproduction of ground truth images. The method uses feed-forward fully convolutional neural networks in an adversarial training setting to achieve significant improvements in image quality at high magnification ratios. Extensive experiments on various datasets show that the approach yields state-of-the-art results in both quantitative and qualitative benchmarks. The task of SISR is inherently ill-posed as no unique solution exists: when downsampled, a large number of different high-resolution (HR) images can give rise to the same low-resolution (LR) image. This one-to-many mapping problem becomes worse at high magnification ratios, making SISR a highly intricate problem. Current state-of-the-art methods are still far from image enhancers like those in the iconic Blade Runner movie, where the loss of high-frequency information for large downsampling factors renders textured regions in super-resolved images blurry, overly smooth, and unnatural. The reason for this behavior is rooted in the choice of the objective function that current state-of-the-art methods employ: most systems minimize the pixel-wise mean squared error (MSE) between the HR ground truth image and its reconstruction from the LR observation, which has been shown to correlate poorly with human perception of image quality. While easy to minimize, the optimal MSE estimator returns the mean of many possible solutions which makes SISR results look unnatural and implausible. This regression-to-the-mean problem in the context of super-resolution is a well-known fact, however, modeling the high-dimensional multi-modal distribution of natural images remains a challenging problem. In this work, we pursue a different strategy to improve the perceptual quality of SISR results. Using a fully convolutional neural network architecture, we propose a novel modification of recent texture synthesis networks in combination with adversarial training and perceptual losses to produce realistic textures at large magnification ratios. The method works on all RGB channels simultaneously and produces sharp results for natural images at a competitive speed. Trained with suitable combinations of losses, we reach state-of-the-art results both in terms of PSNR and using perceptual metrics. We introduce several loss functions for training our network, including a pixel-wise loss in the image-space, a perceptual loss in feature space, a texture matching loss, and an adversarial loss. The pixel-wise loss is used as a baseline, while the perceptual loss encourages the network to produce images that have similar feature representations. The texture matching loss ensures that the generated images have locally similar textures to the HR images, and the adversarial loss helps to produce realistic images that are difficult to distinguish from the original HR images. Our method achieves state-of-the-art results in both quantitative and qualitative benchmarks. The results show that the perceptual loss leads to more realistic images, while the adversEnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis This paper proposes a novel approach for single image super-resolution (SISR) that combines automated texture synthesis with a perceptual loss to generate realistic textures rather than optimizing for pixel-accurate reproduction of ground truth images. The method uses feed-forward fully convolutional neural networks in an adversarial training setting to achieve significant improvements in image quality at high magnification ratios. Extensive experiments on various datasets show that the approach yields state-of-the-art results in both quantitative and qualitative benchmarks. The task of SISR is inherently ill-posed as no unique solution exists: when downsampled, a large number of different high-resolution (HR) images can give rise to the same low-resolution (LR) image. This one-to-many mapping problem becomes worse at high magnification ratios, making SISR a highly intricate problem. Current state-of-the-art methods are still far from image enhancers like those in the iconic Blade Runner movie, where the loss of high-frequency information for large downsampling factors renders textured regions in super-resolved images blurry, overly smooth, and unnatural. The reason for this behavior is rooted in the choice of the objective function that current state-of-the-art methods employ: most systems minimize the pixel-wise mean squared error (MSE) between the HR ground truth image and its reconstruction from the LR observation, which has been shown to correlate poorly with human perception of image quality. While easy to minimize, the optimal MSE estimator returns the mean of many possible solutions which makes SISR results look unnatural and implausible. This regression-to-the-mean problem in the context of super-resolution is a well-known fact, however, modeling the high-dimensional multi-modal distribution of natural images remains a challenging problem. In this work, we pursue a different strategy to improve the perceptual quality of SISR results. Using a fully convolutional neural network architecture, we propose a novel modification of recent texture synthesis networks in combination with adversarial training and perceptual losses to produce realistic textures at large magnification ratios. The method works on all RGB channels simultaneously and produces sharp results for natural images at a competitive speed. Trained with suitable combinations of losses, we reach state-of-the-art results both in terms of PSNR and using perceptual metrics. We introduce several loss functions for training our network, including a pixel-wise loss in the image-space, a perceptual loss in feature space, a texture matching loss, and an adversarial loss. The pixel-wise loss is used as a baseline, while the perceptual loss encourages the network to produce images that have similar feature representations. The texture matching loss ensures that the generated images have locally similar textures to the HR images, and the adversarial loss helps to produce realistic images that are difficult to distinguish from the original HR images. Our method achieves state-of-the-art results in both quantitative and qualitative benchmarks. The results show that the perceptual loss leads to more realistic images, while the advers
Reach us at info@study.space
[slides] EnhanceNet%3A Single Image Super-Resolution Through Automated Texture Synthesis | StudySpace