VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

25 Mar 2024 | Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, and Tao Mei
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation This paper introduces VP3D, a novel text-to-3D generation model that integrates a 2D visual prompt to enhance the generation of 3D models. The model leverages a 2D diffusion model to generate a high-quality image from the input text, which is then used as a visual prompt to guide the 3D generation process. This visual prompt is combined with the text prompt to improve the optimization of the 3D model through a differentiable reward function that encourages the rendered images to align with the visual prompt and match the text prompt. The model also incorporates additional reward functions, including human feedback and visual consistency, to further improve the quality of the generated 3D models. These reward functions help ensure that the generated 3D models are semantically aligned with the text prompt and visually consistent with the visual prompt. VP3D is evaluated on the T^3 Bench benchmark, which contains 300 diverse text prompts across three categories. The results show that VP3D outperforms existing state-of-the-art methods in terms of both quantitative and qualitative metrics. The model is also capable of generating stylized 3D content by using a user-specified reference image as the visual prompt. The key contributions of VP3D include the integration of a 2D visual prompt into the text-to-3D generation process, the use of a differentiable reward function to improve the alignment between the generated 3D model and the input prompts, and the ability to generate stylized 3D content. These contributions demonstrate the effectiveness of VP3D in generating high-quality 3D models that are both visually and semantically aligned with the input text and visual prompts.VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation This paper introduces VP3D, a novel text-to-3D generation model that integrates a 2D visual prompt to enhance the generation of 3D models. The model leverages a 2D diffusion model to generate a high-quality image from the input text, which is then used as a visual prompt to guide the 3D generation process. This visual prompt is combined with the text prompt to improve the optimization of the 3D model through a differentiable reward function that encourages the rendered images to align with the visual prompt and match the text prompt. The model also incorporates additional reward functions, including human feedback and visual consistency, to further improve the quality of the generated 3D models. These reward functions help ensure that the generated 3D models are semantically aligned with the text prompt and visually consistent with the visual prompt. VP3D is evaluated on the T^3 Bench benchmark, which contains 300 diverse text prompts across three categories. The results show that VP3D outperforms existing state-of-the-art methods in terms of both quantitative and qualitative metrics. The model is also capable of generating stylized 3D content by using a user-specified reference image as the visual prompt. The key contributions of VP3D include the integration of a 2D visual prompt into the text-to-3D generation process, the use of a differentiable reward function to improve the alignment between the generated 3D model and the input prompts, and the ability to generate stylized 3D content. These contributions demonstrate the effectiveness of VP3D in generating high-quality 3D models that are both visually and semantically aligned with the input text and visual prompts.
Reach us at info@study.space
Understanding VP3D%3A Unleashing 2D Visual Prompt for Text-to-3D Generation