May 11–16, 2024 | Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
**PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement**
**Authors:** Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
**Abstract:**
PromptCharm is a mixed-initiative system designed to facilitate text-to-image creation by enabling multi-modal prompt engineering and refinement. It addresses the challenges faced by novice users in crafting effective text prompts for state-of-the-art models like Stable Diffusion. PromptCharm automatically refines initial prompts using the Promptit model, supports users in exploring different image styles, and provides visualizations to help users understand and adjust the model's attention to specific keywords. Users can also refine images through attention adjustment or image inpainting. Two user studies demonstrated that PromptCharm significantly improved image quality and alignment with user expectations compared to baseline systems without interactive features.
**Key Contributions:**
- PromptCharm: A mixed-initiative system for text-to-image creation.
- Visualizations, interaction designs, and implementations for interactive prompt engineering.
- Two user studies showing improved image quality and user satisfaction.
**Design Rationale:**
- **Automated Prompt Refinement:** Utilizes the Promptit model to refine initial prompts.
- **Balancing Automation and User Control:** Provides a multi-modal prompting interface for exploring different image styles.
- **Supporting Image Style Exploration:** Allows users to explore and select different modifiers.
- **Version Control:** Helps users track iterations and compare changes.
- **Model Attention Explanations:** Provides visualizations to explain the model's attention.
**Implementation:**
- **Web Application:** Built using Material UI and Python Flask.
- **Machine Learning Models:** Implemented with PyTorch and Transformers.
- **Diffusion Model:** Stable Diffusion v2-1.
**User Study:**
- **Participants:** 12 participants with varying levels of experience.
- **Tasks:** Three tasks to replicate target images.
- **Results:** PromptCharm significantly improved image quality and user satisfaction compared to baselines.**PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement**
**Authors:** Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang
**Abstract:**
PromptCharm is a mixed-initiative system designed to facilitate text-to-image creation by enabling multi-modal prompt engineering and refinement. It addresses the challenges faced by novice users in crafting effective text prompts for state-of-the-art models like Stable Diffusion. PromptCharm automatically refines initial prompts using the Promptit model, supports users in exploring different image styles, and provides visualizations to help users understand and adjust the model's attention to specific keywords. Users can also refine images through attention adjustment or image inpainting. Two user studies demonstrated that PromptCharm significantly improved image quality and alignment with user expectations compared to baseline systems without interactive features.
**Key Contributions:**
- PromptCharm: A mixed-initiative system for text-to-image creation.
- Visualizations, interaction designs, and implementations for interactive prompt engineering.
- Two user studies showing improved image quality and user satisfaction.
**Design Rationale:**
- **Automated Prompt Refinement:** Utilizes the Promptit model to refine initial prompts.
- **Balancing Automation and User Control:** Provides a multi-modal prompting interface for exploring different image styles.
- **Supporting Image Style Exploration:** Allows users to explore and select different modifiers.
- **Version Control:** Helps users track iterations and compare changes.
- **Model Attention Explanations:** Provides visualizations to explain the model's attention.
**Implementation:**
- **Web Application:** Built using Material UI and Python Flask.
- **Machine Learning Models:** Implemented with PyTorch and Transformers.
- **Diffusion Model:** Stable Diffusion v2-1.
**User Study:**
- **Participants:** 12 participants with varying levels of experience.
- **Tasks:** Three tasks to replicate target images.
- **Results:** PromptCharm significantly improved image quality and user satisfaction compared to baselines.