30 Jul 2017 | Mehdi S. M. Sajjadi, Bernhard Schölkopf, Michael Hirsch
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
**Authors:** Mehdi S. M. Sajjadi, Bernhard Schölkopf, Michael Hirsch
**Institution:** Max Planck Institute for Intelligent Systems
**Abstract:**
Single image super-resolution (SISR) aims to infer a high-resolution (HR) image from a low-resolution (LR) input. Traditional metrics like PSNR often correlate poorly with human perception of image quality, leading to over-smoothed images that lack high-frequency textures. EnhanceNet proposes a novel approach combining automated texture synthesis with a perceptual loss, focusing on creating realistic textures rather than pixel-accurate reproduction. Using fully convolutional neural networks in an adversarial training setting, EnhanceNet significantly improves image quality at high magnification ratios. Extensive experiments on various datasets demonstrate the effectiveness of the approach, achieving state-of-the-art results in both quantitative and qualitative benchmarks.
**Introduction:**
SISR is a challenging task due to the ill-posed nature of the problem, where multiple HR images can produce the same LR image. Current methods often produce blurry results and lack high-frequency textures due to the regression-to-the-mean problem. EnhanceNet addresses this by using a fully convolutional neural network architecture, combining texture synthesis, adversarial training, and perceptual losses to produce sharp, realistic textures at high magnification ratios.
**Related Work:**
Recent SISR methods include exemplar-based models, dictionary-based methods, and neural network-based approaches. While these methods have improved reconstruction accuracy, they often suffer from the regression-to-the-mean problem, leading to blurry and unnatural results. EnhanceNet introduces a novel texture matching loss to enforce locally similar textures, improving the quality of the results.
**Method:**
EnhanceNet's architecture is inspired by Long et al. and Johnson et al., using fully convolutional layers for efficient training and inference. The network is trained with a combination of pixel-wise losses, perceptual losses, texture matching losses, and adversarial losses. The perceptual loss uses a pre-trained VGG-19 network to capture high-level features, while the texture matching loss encourages the network to produce textures similar to the high-resolution images.
**Evaluation:**
EnhanceNet is evaluated on various datasets, showing superior performance in both quantitative metrics (PSNR, SSIM, IFC) and qualitative assessments. A user study on the ImageNet dataset confirms the perceptual quality of the results. EnhanceNet outperforms state-of-the-art methods in terms of sharpness and texture reproduction.
**Discussion and Future Work:**
EnhanceNet's limitations include the inability to match ground truth images pixel-wise and occasional artifacts in the output. Future work may focus on improving the depth of the network, reducing computational costs, and enhancing temporal consistency for video super-resolution.EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
**Authors:** Mehdi S. M. Sajjadi, Bernhard Schölkopf, Michael Hirsch
**Institution:** Max Planck Institute for Intelligent Systems
**Abstract:**
Single image super-resolution (SISR) aims to infer a high-resolution (HR) image from a low-resolution (LR) input. Traditional metrics like PSNR often correlate poorly with human perception of image quality, leading to over-smoothed images that lack high-frequency textures. EnhanceNet proposes a novel approach combining automated texture synthesis with a perceptual loss, focusing on creating realistic textures rather than pixel-accurate reproduction. Using fully convolutional neural networks in an adversarial training setting, EnhanceNet significantly improves image quality at high magnification ratios. Extensive experiments on various datasets demonstrate the effectiveness of the approach, achieving state-of-the-art results in both quantitative and qualitative benchmarks.
**Introduction:**
SISR is a challenging task due to the ill-posed nature of the problem, where multiple HR images can produce the same LR image. Current methods often produce blurry results and lack high-frequency textures due to the regression-to-the-mean problem. EnhanceNet addresses this by using a fully convolutional neural network architecture, combining texture synthesis, adversarial training, and perceptual losses to produce sharp, realistic textures at high magnification ratios.
**Related Work:**
Recent SISR methods include exemplar-based models, dictionary-based methods, and neural network-based approaches. While these methods have improved reconstruction accuracy, they often suffer from the regression-to-the-mean problem, leading to blurry and unnatural results. EnhanceNet introduces a novel texture matching loss to enforce locally similar textures, improving the quality of the results.
**Method:**
EnhanceNet's architecture is inspired by Long et al. and Johnson et al., using fully convolutional layers for efficient training and inference. The network is trained with a combination of pixel-wise losses, perceptual losses, texture matching losses, and adversarial losses. The perceptual loss uses a pre-trained VGG-19 network to capture high-level features, while the texture matching loss encourages the network to produce textures similar to the high-resolution images.
**Evaluation:**
EnhanceNet is evaluated on various datasets, showing superior performance in both quantitative metrics (PSNR, SSIM, IFC) and qualitative assessments. A user study on the ImageNet dataset confirms the perceptual quality of the results. EnhanceNet outperforms state-of-the-art methods in terms of sharpness and texture reproduction.
**Discussion and Future Work:**
EnhanceNet's limitations include the inability to match ground truth images pixel-wise and occasional artifacts in the output. Future work may focus on improving the depth of the network, reducing computational costs, and enhancing temporal consistency for video super-resolution.