PromptFix: You Prompt and We Fix the Photo

PromptFix: You Prompt and We Fix the Photo

27 May 2024 | Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
PromptFix is a comprehensive framework designed to enable diffusion models to follow human instructions for a wide range of image-processing tasks. The framework addresses the limitations of existing methods by constructing a large-scale instruction-following dataset and proposing two key techniques: High-frequency Guidance Sampling (HGS) and an Auxiliary Prompt Module. 1. **Dataset Construction**: PromptFix collects a large dataset of $\sim 1.34$ million input-goal-instruction triplets, covering various low-level tasks such as image inpainting, object creation, dehazing, colorization, super-resolution, low-light enhancement, snow removal, and watermark removal. 2. **High-frequency Guidance Sampling**: This technique helps preserve high-frequency details in unprocessed areas by using a low-pass filter operator to calculate a fidelity constraint and integrating VAE skip-connect features during inference with a lightweight LoRA fusion. 3. **Auxiliary Prompt Module**: This module enhances text prompts by utilizing Vision-Language Models (VLMs) to provide more descriptive text prompts, improving the model's task generalization. It includes semantic captions and defect descriptions for degraded images, enhancing the model's ability to handle severe image degradation. Experimental results show that PromptFix outperforms previous methods in various image-processing tasks, achieving superior zero-shot capabilities in blind restoration and combination tasks. The model also exhibits comparable inference efficiency to baseline models. The dataset and code are available at <https://github.com/yeates/PromptFix>.PromptFix is a comprehensive framework designed to enable diffusion models to follow human instructions for a wide range of image-processing tasks. The framework addresses the limitations of existing methods by constructing a large-scale instruction-following dataset and proposing two key techniques: High-frequency Guidance Sampling (HGS) and an Auxiliary Prompt Module. 1. **Dataset Construction**: PromptFix collects a large dataset of $\sim 1.34$ million input-goal-instruction triplets, covering various low-level tasks such as image inpainting, object creation, dehazing, colorization, super-resolution, low-light enhancement, snow removal, and watermark removal. 2. **High-frequency Guidance Sampling**: This technique helps preserve high-frequency details in unprocessed areas by using a low-pass filter operator to calculate a fidelity constraint and integrating VAE skip-connect features during inference with a lightweight LoRA fusion. 3. **Auxiliary Prompt Module**: This module enhances text prompts by utilizing Vision-Language Models (VLMs) to provide more descriptive text prompts, improving the model's task generalization. It includes semantic captions and defect descriptions for degraded images, enhancing the model's ability to handle severe image degradation. Experimental results show that PromptFix outperforms previous methods in various image-processing tasks, achieving superior zero-shot capabilities in blind restoration and combination tasks. The model also exhibits comparable inference efficiency to baseline models. The dataset and code are available at <https://github.com/yeates/PromptFix>.
Reach us at info@study.space
[slides] PromptFix%3A You Prompt and We Fix the Photo | StudySpace