20 Mar 2024 | Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
This paper addresses the limitations of current sketch-to-image (S2I) methods, particularly in generating precise images from freehand sketches. The authors propose an abstraction-aware framework that democratizes sketch control, enabling amateur sketches to produce high-fidelity images without the need for detailed textual prompts. The key contributions include:
1. **Democratizing Sketch Control**: The method allows real amateur sketches to generate accurate images, fulfilling the promise of "what you sketch is what you get."
2. **Abstraction-Aware Framework**: The framework overcomes the limitations of text prompts and spatial-conditioning by converting sketches into fine-grained textual embeddings, guiding the denoising process via cross-attention.
3. **Discriminative Guidance**: The system incorporates discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model to ensure fine-grained sketch-photo association.
The authors conduct a pilot study to highlight the necessity of their approach, identifying that existing models struggle with spatial-conditioning, leading to deformed outputs. Their method operates seamlessly during inference, using a sketch adapter, adaptive time-step sampling, and discriminative guidance. Extensive experiments validate the effectiveness of their approach, demonstrating superior generation quality and sketch-fidelity compared to state-of-the-art methods.This paper addresses the limitations of current sketch-to-image (S2I) methods, particularly in generating precise images from freehand sketches. The authors propose an abstraction-aware framework that democratizes sketch control, enabling amateur sketches to produce high-fidelity images without the need for detailed textual prompts. The key contributions include:
1. **Democratizing Sketch Control**: The method allows real amateur sketches to generate accurate images, fulfilling the promise of "what you sketch is what you get."
2. **Abstraction-Aware Framework**: The framework overcomes the limitations of text prompts and spatial-conditioning by converting sketches into fine-grained textual embeddings, guiding the denoising process via cross-attention.
3. **Discriminative Guidance**: The system incorporates discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model to ensure fine-grained sketch-photo association.
The authors conduct a pilot study to highlight the necessity of their approach, identifying that existing models struggle with spatial-conditioning, leading to deformed outputs. Their method operates seamlessly during inference, using a sketch adapter, adaptive time-step sampling, and discriminative guidance. Extensive experiments validate the effectiveness of their approach, demonstrating superior generation quality and sketch-fidelity compared to state-of-the-art methods.