4 Mar 2024 | Zhengyao Lv1, Yuxiang Wei2, Wangmeng Zuo2,3, Kwan-Yee K. Wong1(✉)
The paper "AdaP-tive LAYout-semantiC fusion modulE (PLACE)" addresses the challenge of synthesizing high-quality images with consistent semantics and layout using pre-trained text-to-image models. The authors propose PLACE, which integrates a layout control map (LCM) to represent layout information in the feature space and an adaptive layout-semantic fusion module to combine layout and semantic features in a timestep-adaptive manner. During fine-tuning, the Semantic Alignment (SA) loss enhances layout alignment, and the Layout-Free Prior Preservation (LFP) loss preserves the priors of pre-trained models, improving visual quality and semantic consistency. Extensive experiments demonstrate that PLACE outperforms existing methods in visual quality, semantic consistency, and layout alignment, even in new domains. The contributions include the introduction of the LCM and the adaptive layout-semantic fusion module, as well as the effective SA and LFP losses. The method is evaluated on datasets like ADE20K and COCO-Stuff, showing superior performance in both in-distribution and out-of-distribution synthesis.The paper "AdaP-tive LAYout-semantiC fusion modulE (PLACE)" addresses the challenge of synthesizing high-quality images with consistent semantics and layout using pre-trained text-to-image models. The authors propose PLACE, which integrates a layout control map (LCM) to represent layout information in the feature space and an adaptive layout-semantic fusion module to combine layout and semantic features in a timestep-adaptive manner. During fine-tuning, the Semantic Alignment (SA) loss enhances layout alignment, and the Layout-Free Prior Preservation (LFP) loss preserves the priors of pre-trained models, improving visual quality and semantic consistency. Extensive experiments demonstrate that PLACE outperforms existing methods in visual quality, semantic consistency, and layout alignment, even in new domains. The contributions include the introduction of the LCM and the adaptive layout-semantic fusion module, as well as the effective SA and LFP losses. The method is evaluated on datasets like ADE20K and COCO-Stuff, showing superior performance in both in-distribution and out-of-distribution synthesis.