28 May 2024 | Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W.H. Lau
Diff-Plugin is a novel framework that enables a pre-trained diffusion model to perform various low-level vision tasks while maintaining its original generative capabilities. The framework introduces a Task-Plugin module with a dual-branch design to provide task-specific priors, guiding the diffusion process in preserving image content. A Plugin-Selector is also proposed to allow users to select different Task-Plugins based on text instructions, enabling text-driven image editing. The framework has been tested on eight low-level vision tasks, demonstrating its superiority over existing methods, particularly in real-world scenarios. The results show that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes. The framework allows users to edit images via natural language instructions for low-level vision tasks. Extensive experiments on eight tasks demonstrate the competitive performance of Diff-Plugin over existing diffusion and regression-based methods. The framework's key contributions include the first framework enabling a pre-trained diffusion model to perform various low-level tasks, the Task-Plugin module for injecting task-specific priors, and the Plugin-Selector for selecting appropriate Task-Plugins based on user text inputs. The framework also shows scalability, adapting to various tasks across different dataset sizes without affecting existing trained plugins. Diff-Plugin outperforms existing diffusion-based methods both visually and quantitatively, achieving competitive performances compared to regression-based methods. The framework enables text-driven low-level task processing, a capability absent in regression-based models.Diff-Plugin is a novel framework that enables a pre-trained diffusion model to perform various low-level vision tasks while maintaining its original generative capabilities. The framework introduces a Task-Plugin module with a dual-branch design to provide task-specific priors, guiding the diffusion process in preserving image content. A Plugin-Selector is also proposed to allow users to select different Task-Plugins based on text instructions, enabling text-driven image editing. The framework has been tested on eight low-level vision tasks, demonstrating its superiority over existing methods, particularly in real-world scenarios. The results show that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes. The framework allows users to edit images via natural language instructions for low-level vision tasks. Extensive experiments on eight tasks demonstrate the competitive performance of Diff-Plugin over existing diffusion and regression-based methods. The framework's key contributions include the first framework enabling a pre-trained diffusion model to perform various low-level tasks, the Task-Plugin module for injecting task-specific priors, and the Plugin-Selector for selecting appropriate Task-Plugins based on user text inputs. The framework also shows scalability, adapting to various tasks across different dataset sizes without affecting existing trained plugins. Diff-Plugin outperforms existing diffusion-based methods both visually and quantitatively, achieving competitive performances compared to regression-based methods. The framework enables text-driven low-level task processing, a capability absent in regression-based models.