[slides] Control Color%3A Multimodal Diffusion-based Interactive Image Colorization

CtrlColor is a multi-modal diffusion-based image colorization framework that enables highly controllable and interactive colorization. The method leverages a pre-trained Stable Diffusion (SD) model and introduces novel techniques to address issues such as color overflow, incorrect color, and limited flexibility in existing colorization methods. It supports both unconditional and conditional colorization, including text prompts, strokes, and exemplar images, and allows for combinations of these conditions. The framework incorporates a content-guided deformable autoencoder and streamlined self-attention guidance to enhance color fidelity and control. CtrlColor achieves high-quality colorization by encoding user strokes into the diffusion process, enabling precise local color manipulation. It also employs a deformable convolution layer in the autoencoder's decoder to align generated colors with input textures, reducing color overflow and inaccuracies. The method is trained on a large-scale image dataset and demonstrates superior performance in terms of color richness, diversity, and visual quality compared to existing methods. The framework is evaluated on multiple datasets, including ImageNet and COCO-Stuff, and shows significant improvements in colorfulness, FID, and CLIP scores. It also outperforms other methods in user studies, where participants selected CtrlColor as the best option for colorization tasks. The method's ability to handle various colorization conditions and its effectiveness in reducing color overflow and inaccuracies make it a promising solution for interactive image colorization.CtrlColor is a multi-modal diffusion-based image colorization framework that enables highly controllable and interactive colorization. The method leverages a pre-trained Stable Diffusion (SD) model and introduces novel techniques to address issues such as color overflow, incorrect color, and limited flexibility in existing colorization methods. It supports both unconditional and conditional colorization, including text prompts, strokes, and exemplar images, and allows for combinations of these conditions. The framework incorporates a content-guided deformable autoencoder and streamlined self-attention guidance to enhance color fidelity and control. CtrlColor achieves high-quality colorization by encoding user strokes into the diffusion process, enabling precise local color manipulation. It also employs a deformable convolution layer in the autoencoder's decoder to align generated colors with input textures, reducing color overflow and inaccuracies. The method is trained on a large-scale image dataset and demonstrates superior performance in terms of color richness, diversity, and visual quality compared to existing methods. The framework is evaluated on multiple datasets, including ImageNet and COCO-Stuff, and shows significant improvements in colorfulness, FID, and CLIP scores. It also outperforms other methods in user studies, where participants selected CtrlColor as the best option for colorization tasks. The method's ability to handle various colorization conditions and its effectiveness in reducing color overflow and inaccuracies make it a promising solution for interactive image colorization.

Control Color: Multimodal Diffusion-based Interactive Image Colorization

16 Feb 2024 | Zhixin Liang, Zhaocen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy