Understanding BrushNet%3A A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

**BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion** **Authors:** Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu **Institution:** ARC Lab, Tencent PCG; The Chinese University of Hong Kong **GitHub:** https://github.com/TencentARC/BrushNet **Abstract:** Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs). However, current DM adaptations for inpainting often suffer from semantic inconsistencies and reduced image quality. To address these challenges, BrushNet introduces a novel paradigm: the division of masked image features and noisy latent into separate branches. This division significantly reduces the model's learning load, facilitating a nuanced incorporation of essential masked image information in a hierarchical fashion. BrushNet is a plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM, ensuring coherent and enhanced image inpainting outcomes. Additionally, BrushData and BrushBench are introduced to facilitate segmentation-based inpainting training and performance assessment. Extensive experimental analysis demonstrates BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence. **Keywords:** Image Inpainting · Diffusion Models · Image Generation **Introduction:** Image inpainting aims to restore missing regions of an image while maintaining overall coherence. Recent advancements in diffusion models have enabled flexible user control with semantic and structural conditions. However, existing diffusion-based text-guided inpainting methods can be categorized into two types: sampling strategy modification and dedicated inpainting models. Sampling strategy modification methods modify the standard denoising process by sampling the masked regions from a pre-trained diffusion model, while dedicated inpainting models fine-tune base diffusion models to incorporate masks and masked images. Both approaches have limitations, such as limited perceptual knowledge of mask boundaries and unmasked image region context, leading to incoherent results. **Motivation:** To address these limitations, BrushNet introduces an additional branch dedicated to masked image processing. This design allows for a more effective architecture for image inpainting, improving the extraction of image features and enabling dense per-pixel control. The model uses a VAE encoder to process the masked image and a hierarchical approach to incorporate the full UNet feature layer-by-layer into the pre-trained UNet. Cross-attention layers are removed from the UNet to ensure pure image information is considered in the additional branch. **Method:** BrushNet employs a dual-branch strategy for masked image guidance insertion and a blending operation with a blurred mask to ensure better unmasked region preservation. The model can achieve flexible control by adjusting the added scale. Quantitative and qualitative comparisons with existing methods show that BrushNet outperforms other models in terms of image quality, masked region preservation, and text alignment. **Evaluation:** BrushNet is**BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion** **Authors:** Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu **Institution:** ARC Lab, Tencent PCG; The Chinese University of Hong Kong **GitHub:** https://github.com/TencentARC/BrushNet **Abstract:** Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs). However, current DM adaptations for inpainting often suffer from semantic inconsistencies and reduced image quality. To address these challenges, BrushNet introduces a novel paradigm: the division of masked image features and noisy latent into separate branches. This division significantly reduces the model's learning load, facilitating a nuanced incorporation of essential masked image information in a hierarchical fashion. BrushNet is a plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM, ensuring coherent and enhanced image inpainting outcomes. Additionally, BrushData and BrushBench are introduced to facilitate segmentation-based inpainting training and performance assessment. Extensive experimental analysis demonstrates BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence. **Keywords:** Image Inpainting · Diffusion Models · Image Generation **Introduction:** Image inpainting aims to restore missing regions of an image while maintaining overall coherence. Recent advancements in diffusion models have enabled flexible user control with semantic and structural conditions. However, existing diffusion-based text-guided inpainting methods can be categorized into two types: sampling strategy modification and dedicated inpainting models. Sampling strategy modification methods modify the standard denoising process by sampling the masked regions from a pre-trained diffusion model, while dedicated inpainting models fine-tune base diffusion models to incorporate masks and masked images. Both approaches have limitations, such as limited perceptual knowledge of mask boundaries and unmasked image region context, leading to incoherent results. **Motivation:** To address these limitations, BrushNet introduces an additional branch dedicated to masked image processing. This design allows for a more effective architecture for image inpainting, improving the extraction of image features and enabling dense per-pixel control. The model uses a VAE encoder to process the masked image and a hierarchical approach to incorporate the full UNet feature layer-by-layer into the pre-trained UNet. Cross-attention layers are removed from the UNet to ensure pure image information is considered in the additional branch. **Method:** BrushNet employs a dual-branch strategy for masked image guidance insertion and a blending operation with a blurred mask to ensure better unmasked region preservation. The model can achieve flexible control by adjusting the added scale. Quantitative and qualitative comparisons with existing methods show that BrushNet outperforms other models in terms of image quality, masked region preservation, and text alignment. **Evaluation:** BrushNet is

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

11 Mar 2024 | Xuan Ju1,2, Xian Liu1,2, Xintao Wang1*, Yuxuan Bian2, Ying Shan1, and Qiang Xu2*

11 Mar 2024 | Xuan Ju1,2, Xian Liu1,2, Xintao Wang1, Yuxuan Bian2, Ying Shan1, and Qiang Xu2