Transparent Image Layer Diffusion using Latent Transparency

Transparent Image Layer Diffusion using Latent Transparency

July 2024 | Lvmin Zhang, Maneesh Agrawala
This paper introduces "latent transparency," an approach to generate transparent images and multiple transparent layers using large-scale pretrained latent diffusion models. The method encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model, enabling the generation of single transparent images or multiple transparent layers. The approach preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution. The method is trained on 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. The framework can be applied to different open source image generators or adapted to various conditional control systems for applications like foreground/background-conditioned layer generation, joint layer generation, and structural control of layer contents. A user study finds that in most cases (97%), users prefer the natively generated transparent content over previous ad-hoc solutions like generating and then matting. Users also report the quality of the generated transparent images is comparable to real commercial transparent assets like Adobe Stock. The method uses a shared attention mechanism to generate layers with consistent and harmonious blending, and employs LoRAs to adapt the models to different layer conditions. The framework is trained using a human-in-the-loop scheme and a dataset of 1M transparent images, covering a diversity of content topics and styles. The method is also extended to generate multi-layer samples using state-of-the-art methods. The results show that the generated transparent images are of high quality and comparable to commercial assets. The method is also applied to various community models, LoRAs, and prompt styles without additional training, demonstrating its versatility and potential for wider use in diverse creative and professional domains. The framework is also tested for inference speed and has been shown to be efficient. The method has limitations, such as the trade-off between generating clean transparent elements and harmonious blending. The paper concludes that the proposed approach is effective for generating transparent images and layers, and has the potential for broader applications in image editing and generation.This paper introduces "latent transparency," an approach to generate transparent images and multiple transparent layers using large-scale pretrained latent diffusion models. The method encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model, enabling the generation of single transparent images or multiple transparent layers. The approach preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution. The method is trained on 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. The framework can be applied to different open source image generators or adapted to various conditional control systems for applications like foreground/background-conditioned layer generation, joint layer generation, and structural control of layer contents. A user study finds that in most cases (97%), users prefer the natively generated transparent content over previous ad-hoc solutions like generating and then matting. Users also report the quality of the generated transparent images is comparable to real commercial transparent assets like Adobe Stock. The method uses a shared attention mechanism to generate layers with consistent and harmonious blending, and employs LoRAs to adapt the models to different layer conditions. The framework is trained using a human-in-the-loop scheme and a dataset of 1M transparent images, covering a diversity of content topics and styles. The method is also extended to generate multi-layer samples using state-of-the-art methods. The results show that the generated transparent images are of high quality and comparable to commercial assets. The method is also applied to various community models, LoRAs, and prompt styles without additional training, demonstrating its versatility and potential for wider use in diverse creative and professional domains. The framework is also tested for inference speed and has been shown to be efficient. The method has limitations, such as the trade-off between generating clean transparent elements and harmonious blending. The paper concludes that the proposed approach is effective for generating transparent images and layers, and has the potential for broader applications in image editing and generation.
Reach us at info@study.space