IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

15 Mar 2024 | Yizhi Song1*, Zhifei Zhang2, Zhe Lin2, Scott Cohen2, Brian Price2, Jianming Zhang2, Soo Ye Kim2, He Zhang2, Wei Xiong2, Daniel Aliaga1
**IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation** **Authors:** Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga **Institution:** Purdue University, Adobe Research **Abstract:** Generative object compositing is a promising field in image editing, but the challenge of identity preservation limits its practical applications. This paper introduces IMPRINT, a novel diffusion-based generative model that decouples identity preservation from compositing through a two-stage learning framework. The first stage pretrains an object encoder to learn view-invariant and detail-preserving representations, while the second stage uses these representations to seamlessly harmonize the object with the background. IMPRINT also incorporates a shape-guidance mechanism for user-controlled compositing. Extensive experiments demonstrate that IMPRINT outperforms existing methods in identity preservation and composition quality. **Contributions:** - Context-agnostic ID-preserving pretraining for detailed and view-invariant object representations. - A two-stage framework that separately ensures object fidelity and geometric variations. - Mask control for enhanced shape guidance and generation flexibility. - Comprehensive evaluation on appearance retention, highlighting the impact of various factors on identity preservation. **Related Work:** - Image compositing and subject-driven image generation are surveyed, emphasizing the trade-offs between identity preservation and background harmony. - Previous methods are discussed, including their limitations in identity preservation and geometric adjustments. ** Approach:** - **Context-Agnostic ID-preserving Stage:** Trains an image encoder to learn view-invariant features using a supervised object view reconstruction task. - **Compositing Stage:** Finetunes the encoder and generator to composite the object with the background, guided by the ID-preserving representations. - **Paired Data Generation:** Collects high-resolution images and video datasets for better compositing quality. - **Training Strategies:** Uses a sequential collaborative training scheme to stabilize the training process and improve identity preservation. **Experiments:** - **Quantitative Evaluation:** IMPRINT outperforms baselines in both realism and fidelity. - **Qualitative Evaluation:** IMPRINT maintains object details better than competitors. - **User Study:** IMPRINT is preferred for realism and detail preservation. **Conclusion:** IMPRINT achieves state-of-the-art performance in identity preservation and background harmonization for generative object compositing. Limitations include issues with large view changes and small text or logo consistency. Future work could explore 3D models or NERF representations to address these limitations.**IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation** **Authors:** Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga **Institution:** Purdue University, Adobe Research **Abstract:** Generative object compositing is a promising field in image editing, but the challenge of identity preservation limits its practical applications. This paper introduces IMPRINT, a novel diffusion-based generative model that decouples identity preservation from compositing through a two-stage learning framework. The first stage pretrains an object encoder to learn view-invariant and detail-preserving representations, while the second stage uses these representations to seamlessly harmonize the object with the background. IMPRINT also incorporates a shape-guidance mechanism for user-controlled compositing. Extensive experiments demonstrate that IMPRINT outperforms existing methods in identity preservation and composition quality. **Contributions:** - Context-agnostic ID-preserving pretraining for detailed and view-invariant object representations. - A two-stage framework that separately ensures object fidelity and geometric variations. - Mask control for enhanced shape guidance and generation flexibility. - Comprehensive evaluation on appearance retention, highlighting the impact of various factors on identity preservation. **Related Work:** - Image compositing and subject-driven image generation are surveyed, emphasizing the trade-offs between identity preservation and background harmony. - Previous methods are discussed, including their limitations in identity preservation and geometric adjustments. ** Approach:** - **Context-Agnostic ID-preserving Stage:** Trains an image encoder to learn view-invariant features using a supervised object view reconstruction task. - **Compositing Stage:** Finetunes the encoder and generator to composite the object with the background, guided by the ID-preserving representations. - **Paired Data Generation:** Collects high-resolution images and video datasets for better compositing quality. - **Training Strategies:** Uses a sequential collaborative training scheme to stabilize the training process and improve identity preservation. **Experiments:** - **Quantitative Evaluation:** IMPRINT outperforms baselines in both realism and fidelity. - **Qualitative Evaluation:** IMPRINT maintains object details better than competitors. - **User Study:** IMPRINT is preferred for realism and detail preservation. **Conclusion:** IMPRINT achieves state-of-the-art performance in identity preservation and background harmonization for generative object compositing. Limitations include issues with large view changes and small text or logo consistency. Future work could explore 3D models or NERF representations to address these limitations.
Reach us at info@study.space