15 Mar 2024 | Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga
IMPRINT is a novel two-stage generative model for object compositing that excels in identity preservation and background harmonization. The model is trained with a two-stage learning framework that decouples identity preservation from compositing. The first stage focuses on context-agnostic, identity-preserving pretraining of the object encoder, enabling the encoder to learn a view-invariant representation that is conducive to detail preservation. The second stage leverages this representation to learn seamless harmonization of the object with the background. Additionally, IMPRINT incorporates a shape-guidance mechanism for user-directed control over the compositing process. Extensive experiments show that IMPRINT significantly outperforms existing methods in identity preservation and composition quality. The model is trained on a combination of image and video datasets, and uses a pre-trained text-to-image diffusion model for generation. The model is evaluated on two test sets, demonstrating superior performance in realism and fidelity. The model also allows for shape-guided generation, enabling more flexible and controllable compositing. The results show that IMPRINT achieves better identity preservation and more flexible adaptation to the background in terms of color and geometry. The model is also compared with other methods, showing its effectiveness in generating high-fidelity composites. The model's two-stage training framework allows for better control over the compositing process, and the use of a pre-trained image encoder enables the model to capture detailed object features. The model is also evaluated in terms of user study results, showing that it outperforms other methods in terms of realism and fidelity. The model's ability to generate high-fidelity composites with flexible adaptation to the background makes it a promising approach for generative object compositing.IMPRINT is a novel two-stage generative model for object compositing that excels in identity preservation and background harmonization. The model is trained with a two-stage learning framework that decouples identity preservation from compositing. The first stage focuses on context-agnostic, identity-preserving pretraining of the object encoder, enabling the encoder to learn a view-invariant representation that is conducive to detail preservation. The second stage leverages this representation to learn seamless harmonization of the object with the background. Additionally, IMPRINT incorporates a shape-guidance mechanism for user-directed control over the compositing process. Extensive experiments show that IMPRINT significantly outperforms existing methods in identity preservation and composition quality. The model is trained on a combination of image and video datasets, and uses a pre-trained text-to-image diffusion model for generation. The model is evaluated on two test sets, demonstrating superior performance in realism and fidelity. The model also allows for shape-guided generation, enabling more flexible and controllable compositing. The results show that IMPRINT achieves better identity preservation and more flexible adaptation to the background in terms of color and geometry. The model is also compared with other methods, showing its effectiveness in generating high-fidelity composites. The model's two-stage training framework allows for better control over the compositing process, and the use of a pre-trained image encoder enables the model to capture detailed object features. The model is also evaluated in terms of user study results, showing that it outperforms other methods in terms of realism and fidelity. The model's ability to generate high-fidelity composites with flexible adaptation to the background makes it a promising approach for generative object compositing.