The paper introduces TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to address the challenges of maintaining semantic and contextual consistency in multi-turn image generation. TheaterGen leverages LLMs as "Screenwriters" to manage a standardized prompt book, which includes character prompts and layout designs. This framework generates character images and extracts guidance information, which are then used in the reverse denoising process of T2I diffusion models to produce final images. The authors also introduce CMIGBench, a new benchmark with 8000 multi-turn instructions, to evaluate semantic and contextual consistency in multi-turn image generation. Extensive experiments show that TheaterGen outperforms state-of-the-art methods, achieving significant improvements in character-character similarity and text-image similarity. The paper discusses the contributions, related work, methodological details, and experimental results, highlighting the effectiveness of TheaterGen in generating consistent and high-quality images in multi-turn scenarios.The paper introduces TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to address the challenges of maintaining semantic and contextual consistency in multi-turn image generation. TheaterGen leverages LLMs as "Screenwriters" to manage a standardized prompt book, which includes character prompts and layout designs. This framework generates character images and extracts guidance information, which are then used in the reverse denoising process of T2I diffusion models to produce final images. The authors also introduce CMIGBench, a new benchmark with 8000 multi-turn instructions, to evaluate semantic and contextual consistency in multi-turn image generation. Extensive experiments show that TheaterGen outperforms state-of-the-art methods, achieving significant improvements in character-character similarity and text-image similarity. The paper discusses the contributions, related work, methodological details, and experimental results, highlighting the effectiveness of TheaterGen in generating consistent and high-quality images in multi-turn scenarios.