Learning Continuous 3D Words for Text-to-Image Generation

Learning Continuous 3D Words for Text-to-Image Generation

13 Feb 2024 | Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomír Měch, Andrew Markham, Niki Trigoni
The paper introduces Continuous 3D Words, a novel approach to enable fine-grained control over various attributes in text-to-image generation. These attributes, such as illumination direction, non-rigid shape change, orientation, and camera parameters, are represented as continuous tokens that can be adjusted using sliders in text prompts. The method is trained using a single 3D mesh and a rendering engine, with minimal runtime and memory costs. The authors propose a two-stage training strategy and ControlNet augmentations to disentangle object identity from the attributes and prevent overfitting. Extensive experiments demonstrate that Continuous 3D Words outperform existing baselines in both quantitative and qualitative evaluations, showing superior generalization and aesthetic quality. The method is lightweight and accessible, making it suitable for a wide range of applications in the vision community.The paper introduces Continuous 3D Words, a novel approach to enable fine-grained control over various attributes in text-to-image generation. These attributes, such as illumination direction, non-rigid shape change, orientation, and camera parameters, are represented as continuous tokens that can be adjusted using sliders in text prompts. The method is trained using a single 3D mesh and a rendering engine, with minimal runtime and memory costs. The authors propose a two-stage training strategy and ControlNet augmentations to disentangle object identity from the attributes and prevent overfitting. Extensive experiments demonstrate that Continuous 3D Words outperform existing baselines in both quantitative and qualitative evaluations, showing superior generalization and aesthetic quality. The method is lightweight and accessible, making it suitable for a wide range of applications in the vision community.
Reach us at info@study.space
[slides] Learning Continuous 3D Words for Text-to-Image Generation | StudySpace