It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

20 Mar 2024 | Subhadeep Koley, Ayan Kumar Bhunia, Deetanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
This paper explores the potential of sketches for controlling diffusion models, addressing the deceptive promise of direct sketch control in generative AI. The authors aim to democratize sketch control, enabling amateur sketches to generate precise images, fulfilling the commitment of "what you sketch is what you get." They propose an abstraction-aware framework that uses a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model to enhance fine-grained sketch-photo association. The framework operates without textual prompts, relying on a simple, rough sketch. The authors validate their approach through extensive experiments, demonstrating its effectiveness in generating photorealistic images from abstract sketches. The paper highlights the limitations of existing methods, which often struggle with freehand abstract sketches due to spatial-conditioning approaches that lead to deformed outputs. The proposed method addresses these issues by converting input sketches into equivalent textual embeddings, allowing for more accurate and faithful image generation. The framework also introduces abstraction-aware time-step sampling, adjusting the sampling distribution based on the abstraction level of the input sketch. This ensures that the denoising process adapts to the complexity of the sketch, improving both photorealism and sketch fidelity. The authors compare their method with state-of-the-art diffusion and GAN-based models, showing superior performance in terms of generation quality, sketch fidelity, and user study results. They also demonstrate the generalization capability of their method across different datasets and stroke styles, as well as its robustness to noisy and partially-complete sketches. The method enables fine-grained semantic editing, allowing for local changes in the sketch domain to be seamlessly transferred to the output image. The paper concludes that their approach significantly advances the democratization of sketch control in diffusion models, making it more accessible for non-experts.This paper explores the potential of sketches for controlling diffusion models, addressing the deceptive promise of direct sketch control in generative AI. The authors aim to democratize sketch control, enabling amateur sketches to generate precise images, fulfilling the commitment of "what you sketch is what you get." They propose an abstraction-aware framework that uses a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model to enhance fine-grained sketch-photo association. The framework operates without textual prompts, relying on a simple, rough sketch. The authors validate their approach through extensive experiments, demonstrating its effectiveness in generating photorealistic images from abstract sketches. The paper highlights the limitations of existing methods, which often struggle with freehand abstract sketches due to spatial-conditioning approaches that lead to deformed outputs. The proposed method addresses these issues by converting input sketches into equivalent textual embeddings, allowing for more accurate and faithful image generation. The framework also introduces abstraction-aware time-step sampling, adjusting the sampling distribution based on the abstraction level of the input sketch. This ensures that the denoising process adapts to the complexity of the sketch, improving both photorealism and sketch fidelity. The authors compare their method with state-of-the-art diffusion and GAN-based models, showing superior performance in terms of generation quality, sketch fidelity, and user study results. They also demonstrate the generalization capability of their method across different datasets and stroke styles, as well as its robustness to noisy and partially-complete sketches. The method enables fine-grained semantic editing, allowing for local changes in the sketch domain to be seamlessly transferred to the output image. The paper concludes that their approach significantly advances the democratization of sketch control in diffusion models, making it more accessible for non-experts.
Reach us at info@study.space