Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

2024-03-14 | Fangfu Liu, Hanyang Wang, Weiliang Chen, Hao wen Sun, and Yueqi Duan
Make-Your-3D is a novel method for fast and consistent subject-driven 3D content generation. The method allows users to personalize high-fidelity and consistent 3D content from a single image of a subject with text-driven modifications within 5 minutes. The key idea is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. A co-evolution framework is designed to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization. Extensive experiments demonstrate that the method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in the subject image. The method is 36× faster than DreamBooth3D and requires only a single image as input, eliminating the need for multiple subject-specific images. The method is evaluated on a dataset used in DreamBooth3D and open-vocabulary wild images captured from a subject with different styles. The results show that the method produces vivid and high-fidelity 3D assets with strong adherence to the given subject while highly respecting the contextualization in the input text prompts. The method is compared with DreamBooth3D and shows superior performance in terms of quality, resolution, and consistency. The method is also compared with a recent multi-view DreamBooth implementation and shows better preservation of subject identity without overfitting to data bias. The method is robust in various open-vocabulary settings and achieves high-quality results in failure cases of DreamBooth3D. The method is effective in generating subject-specific 3D assets and has great potential for subject-driven customization. The method is based on diffusion models and is designed to generate 3D content from text or image prompts. The method is evaluated in various applications, including stylization, accessorization, and motion modification. The method is also evaluated in a user study, where users rate the 3D consistency, subject fidelity, prompt fidelity, and overall quality of the generated 3D assets. The results show that the method is significantly preferred by users over these aspects. The method is also evaluated in an ablation study, where the design choices of identity-aware optimization and subject-prior optimization are ablated. The results reveal that the omission of any of two elements leads to a degradation in terms of subject-driven fidelity. The method is also evaluated in different applications, including human personalization, where the method can change attributes like hair, clothes, and more. The results further support the effectiveness of the co-evolution framework in the method and present the great potential for subject-driven customization. The method is based on diffusion models and is designed to generate 3D content from text or image prompts. The method is evaluated in various applications, including stylization, accessorization, and motion modification. The method is alsoMake-Your-3D is a novel method for fast and consistent subject-driven 3D content generation. The method allows users to personalize high-fidelity and consistent 3D content from a single image of a subject with text-driven modifications within 5 minutes. The key idea is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. A co-evolution framework is designed to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization. Extensive experiments demonstrate that the method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in the subject image. The method is 36× faster than DreamBooth3D and requires only a single image as input, eliminating the need for multiple subject-specific images. The method is evaluated on a dataset used in DreamBooth3D and open-vocabulary wild images captured from a subject with different styles. The results show that the method produces vivid and high-fidelity 3D assets with strong adherence to the given subject while highly respecting the contextualization in the input text prompts. The method is compared with DreamBooth3D and shows superior performance in terms of quality, resolution, and consistency. The method is also compared with a recent multi-view DreamBooth implementation and shows better preservation of subject identity without overfitting to data bias. The method is robust in various open-vocabulary settings and achieves high-quality results in failure cases of DreamBooth3D. The method is effective in generating subject-specific 3D assets and has great potential for subject-driven customization. The method is based on diffusion models and is designed to generate 3D content from text or image prompts. The method is evaluated in various applications, including stylization, accessorization, and motion modification. The method is also evaluated in a user study, where users rate the 3D consistency, subject fidelity, prompt fidelity, and overall quality of the generated 3D assets. The results show that the method is significantly preferred by users over these aspects. The method is also evaluated in an ablation study, where the design choices of identity-aware optimization and subject-prior optimization are ablated. The results reveal that the omission of any of two elements leads to a degradation in terms of subject-driven fidelity. The method is also evaluated in different applications, including human personalization, where the method can change attributes like hair, clothes, and more. The results further support the effectiveness of the co-evolution framework in the method and present the great potential for subject-driven customization. The method is based on diffusion models and is designed to generate 3D content from text or image prompts. The method is evaluated in various applications, including stylization, accessorization, and motion modification. The method is also
Reach us at info@study.space