28 May 2024 | Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W.H. Lau
**Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks**
Diffusion models have achieved significant progress in image synthesis, but they struggle with low-level tasks that require detailed preservation due to their inherent randomness. To address this, the authors propose *Diff-Plugin*, a framework that enables a pre-trained diffusion model to perform various low-level tasks while maintaining high-fidelity results. The framework consists of two main components: a lightweight Task-Plugin module and a Plugin-Selector.
1. **Task-Plugin Module**: This module is designed to inject task-specific priors into the diffusion process. It includes a Task-Prompt Branch (TPB) and a Spatial Complement Branch (SCB). The TPB distills task-specific guidance, while the SCB leverages this guidance to enhance spatial detail preservation. This dual-branch design ensures that the model can handle diverse low-level tasks with high fidelity.
2. **Plugin-Selector**: This component allows users to select the appropriate Task-Plugin based on text inputs. It uses multi-task contrastive learning to align visual embeddings with task-specific text inputs, making the framework robust and user-friendly.
Experiments on eight low-level vision tasks demonstrate the effectiveness of *Diff-Plugin*. The method outperforms existing diffusion and regression-based methods, both visually and quantitatively, and shows scalability across different dataset sizes. The framework is stable, schedulable, and supports robust training, making it a promising tool for text-driven low-level task processing.**Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks**
Diffusion models have achieved significant progress in image synthesis, but they struggle with low-level tasks that require detailed preservation due to their inherent randomness. To address this, the authors propose *Diff-Plugin*, a framework that enables a pre-trained diffusion model to perform various low-level tasks while maintaining high-fidelity results. The framework consists of two main components: a lightweight Task-Plugin module and a Plugin-Selector.
1. **Task-Plugin Module**: This module is designed to inject task-specific priors into the diffusion process. It includes a Task-Prompt Branch (TPB) and a Spatial Complement Branch (SCB). The TPB distills task-specific guidance, while the SCB leverages this guidance to enhance spatial detail preservation. This dual-branch design ensures that the model can handle diverse low-level tasks with high fidelity.
2. **Plugin-Selector**: This component allows users to select the appropriate Task-Plugin based on text inputs. It uses multi-task contrastive learning to align visual embeddings with task-specific text inputs, making the framework robust and user-friendly.
Experiments on eight low-level vision tasks demonstrate the effectiveness of *Diff-Plugin*. The method outperforms existing diffusion and regression-based methods, both visually and quantitatively, and shows scalability across different dataset sizes. The framework is stable, schedulable, and supports robust training, making it a promising tool for text-driven low-level task processing.