18 Jan 2024 | Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wen
DiffusionGPT is a unified text-to-image generation system that leverages Large Language Models (LLMs) to handle diverse input prompts and integrate domain-expert models. The system addresses the limitations of current text-to-image systems by parsing various input types (prompt-based, instruction-based, inspiration-based, and hypothesis-based) and selecting the most suitable generative model from a Tree-of-Thought (ToT) structure. DiffusionGPT constructs domain-specific Trees for different models based on prior knowledge and enriches the ToT with human feedback through Advantage Databases. This approach enhances the system's ability to generate high-quality images across diverse domains. Extensive experiments and comparisons demonstrate the effectiveness of DiffusionGPT, showing superior performance in image synthesis compared to traditional stable diffusion models. The system is training-free, versatile, and efficient, making it a promising solution for community development in image generation.DiffusionGPT is a unified text-to-image generation system that leverages Large Language Models (LLMs) to handle diverse input prompts and integrate domain-expert models. The system addresses the limitations of current text-to-image systems by parsing various input types (prompt-based, instruction-based, inspiration-based, and hypothesis-based) and selecting the most suitable generative model from a Tree-of-Thought (ToT) structure. DiffusionGPT constructs domain-specific Trees for different models based on prior knowledge and enriches the ToT with human feedback through Advantage Databases. This approach enhances the system's ability to generate high-quality images across diverse domains. Extensive experiments and comparisons demonstrate the effectiveness of DiffusionGPT, showing superior performance in image synthesis compared to traditional stable diffusion models. The system is training-free, versatile, and efficient, making it a promising solution for community development in image generation.