27 May 2024 | Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
PromptFix is a comprehensive framework that enables diffusion models to follow human instructions for a wide variety of image-processing tasks. The framework addresses challenges in image generation and editing, particularly in preserving high-frequency details and handling degraded images. The key contributions include the construction of a large-scale instruction-following dataset covering low-level tasks, image editing, and object creation, the introduction of a high-frequency guidance sampling method to maintain spatial details during denoising, and the design of an auxiliary prompting adapter using Vision-Language Models (VLMs) to enhance text prompts and improve task generalization. Experimental results show that PromptFix outperforms previous methods in various image-processing tasks, achieving comparable inference efficiency and superior zero-shot capabilities in blind restoration and combination tasks. The dataset and code are available at https://github.com/yeates/PromptFix. The framework is designed to handle degraded images, including severe cases such as low-resolution images, and adapts to different types of image degradation. It also provides additional pathways for precise semantic representation of the target image. The model is evaluated on three image editing tasks (colorization, watermark removal, object removal) and four image restoration tasks (dehazing, desnowing, super-resolution, and low-light enhancement), demonstrating superior performance in terms of perceptual pixel similarity and no-reference image quality. The model's effectiveness is validated through extensive experiments, showing its robustness and versatility in low-level image processing tasks.PromptFix is a comprehensive framework that enables diffusion models to follow human instructions for a wide variety of image-processing tasks. The framework addresses challenges in image generation and editing, particularly in preserving high-frequency details and handling degraded images. The key contributions include the construction of a large-scale instruction-following dataset covering low-level tasks, image editing, and object creation, the introduction of a high-frequency guidance sampling method to maintain spatial details during denoising, and the design of an auxiliary prompting adapter using Vision-Language Models (VLMs) to enhance text prompts and improve task generalization. Experimental results show that PromptFix outperforms previous methods in various image-processing tasks, achieving comparable inference efficiency and superior zero-shot capabilities in blind restoration and combination tasks. The dataset and code are available at https://github.com/yeates/PromptFix. The framework is designed to handle degraded images, including severe cases such as low-resolution images, and adapts to different types of image degradation. It also provides additional pathways for precise semantic representation of the target image. The model is evaluated on three image editing tasks (colorization, watermark removal, object removal) and four image restoration tasks (dehazing, desnowing, super-resolution, and low-light enhancement), demonstrating superior performance in terms of perceptual pixel similarity and no-reference image quality. The model's effectiveness is validated through extensive experiments, showing its robustness and versatility in low-level image processing tasks.