[slides and audio] PromptCharm%3A Text-to-Image Generation through Multi-modal Prompting and Refinement

PROMPTCHARM is a mixed-initiative system that facilitates text-to-image generation through multi-modal prompting and refinement. It addresses the challenges of prompt engineering for novice users by automatically refining and optimizing initial prompts, supporting exploration of different image styles, and providing visualizations of model attention to help users refine their prompts and images. The system also allows users to adjust model attention to specific keywords or remove undesired parts of an image through image inpainting. A user study with 24 participants showed that PROMPTCHARM enabled users to create images with higher quality and better alignment with their expectations compared to baseline tools. The system provides version control to help users track their image creations during iterative prompting and refinement. PROMPTCHARM leverages the Stable Diffusion model as the text-to-image generation pipeline and includes features such as prompt refinement, modifier exploration, model attention adjustment, and image inpainting. The system was evaluated through two user studies, demonstrating its effectiveness in assisting users to create high-quality images with better aesthetics. The design of PROMPTCHARM is motivated by the need to balance automation and user control, provide explanations for generated content, and support iterative refinement of images. The system was implemented as a web application using Python Flask and PyTorch, and it was tested with a user study involving 24 participants. The results of the study showed that PROMPTCHARM significantly improved the quality and aesthetics of generated images compared to baseline tools.PROMPTCHARM is a mixed-initiative system that facilitates text-to-image generation through multi-modal prompting and refinement. It addresses the challenges of prompt engineering for novice users by automatically refining and optimizing initial prompts, supporting exploration of different image styles, and providing visualizations of model attention to help users refine their prompts and images. The system also allows users to adjust model attention to specific keywords or remove undesired parts of an image through image inpainting. A user study with 24 participants showed that PROMPTCHARM enabled users to create images with higher quality and better alignment with their expectations compared to baseline tools. The system provides version control to help users track their image creations during iterative prompting and refinement. PROMPTCHARM leverages the Stable Diffusion model as the text-to-image generation pipeline and includes features such as prompt refinement, modifier exploration, model attention adjustment, and image inpainting. The system was evaluated through two user studies, demonstrating its effectiveness in assisting users to create high-quality images with better aesthetics. The design of PROMPTCHARM is motivated by the need to balance automation and user control, provide explanations for generated content, and support iterative refinement of images. The system was implemented as a web application using Python Flask and PyTorch, and it was tested with a user study involving 24 participants. The results of the study showed that PROMPTCHARM significantly improved the quality and aesthetics of generated images compared to baseline tools.

PROMPTCHARM: Text-to-Image Generation through Multi-modal Prompting and Refinement

May 11–16, 2024 | Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, Tianyi Zhang