Vol. 43, No. 4, Article 52. Publication date: July 2024. | YOAD TEWEL, NVIDIA, Israel and Tel Aviv University, Israel OMRI KADURI, Independent Scientist, Israel RINON GAL, NVIDIA, Israel and Tel Aviv University, Israel YONI KASTEN, NVIDIA, Israel LIOR WOLF, Tel Aviv University, Israel GAL CHECHIK, NVIDIA, Israel YUVAL ATZMON, NVIDIA, Israel
**Consistory** is a training-free approach for generating consistent images from text prompts, focusing on maintaining the same subject identity across diverse prompts. The method leverages the internal activations of a pre-trained text-to-image (T2I) diffusion model to ensure subject consistency without requiring additional optimization or pre-training. Key components include a subject-driven shared attention block and a correspondence-based feature injection mechanism. These techniques promote subject consistency while maintaining layout diversity. The method is evaluated against various baselines and demonstrates superior performance in subject consistency and text alignment, achieving state-of-the-art results without any optimization steps. Additionally, **Consistory** can extend to multi-subject scenarios and enable training-free personalization for common objects. The paper also introduces a new benchmark dataset for consistency evaluation and discusses limitations and future directions.**Consistory** is a training-free approach for generating consistent images from text prompts, focusing on maintaining the same subject identity across diverse prompts. The method leverages the internal activations of a pre-trained text-to-image (T2I) diffusion model to ensure subject consistency without requiring additional optimization or pre-training. Key components include a subject-driven shared attention block and a correspondence-based feature injection mechanism. These techniques promote subject consistency while maintaining layout diversity. The method is evaluated against various baselines and demonstrates superior performance in subject consistency and text alignment, achieving state-of-the-art results without any optimization steps. Additionally, **Consistory** can extend to multi-subject scenarios and enable training-free personalization for common objects. The paper also introduces a new benchmark dataset for consistency evaluation and discusses limitations and future directions.