Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

2 Apr 2024 | Xi yi Chen, Marko Mihajlovic, Shaofei Wang, Sergey Prokudin, Siyu Tang
Morphable Diffusion: 3D-Consistent Diffusion for Single-Image Avatar Creation This paper introduces a morphable diffusion model that enables consistent and controllable novel view synthesis of humans from a single image. The model integrates a 3D morphable model into a state-of-the-art multi-view-consistent diffusion approach, allowing for the generation of 3D consistent and photo-realistic images from novel viewpoints. The method is capable of creating fully 3D-consistent, animatable, and photorealistic human avatars from a single image of an unseen subject. The model is trained using a shuffled training scheme that enables the generation of new facial expressions for an unseen subject from a single image. The model is evaluated on both novel view and novel expression synthesis tasks, and the results show that the proposed method outperforms existing state-of-the-art avatar creation models. The code for the project is publicly available. The method is based on the recent multi-view consistent diffusion model, and it improves the reconstruction quality and allows for explicit manipulation of the synthesized images. The model is trained using a 3D morphable model that performs an uplifting of the noisy image features and associates them with the corresponding mesh vertices in 3D space. The model is evaluated on the FaceScape and THuman 2.0 datasets, and the results show that the proposed method produces more realistic face images with more accurate facial expressions and better resemblance. The method is also evaluated on the novel facial expression synthesis task, and the results show that the proposed method outperforms the baselines on all metrics. The method is limited in resolution to 256x256 due to the components proposed in previous works. The method is also limited in generalization to out-of-distribution camera parameters and ethnicities. The method is capable of maintaining 3D consistency of the generated imagery and preserving target facial expression, but struggles with the hairstyle reconstruction and out-of-distribution ethnicities due to the strong bias of the training data towards hair caps and Asian subjects. The method is also limited in the ability to generalize to out-of-distribution camera parameters that are significantly different from the ones used during training. The method is capable of generating realistic facial images under novel expressions from various views while retaining high visual quality, with input image in any facial expression. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent andMorphable Diffusion: 3D-Consistent Diffusion for Single-Image Avatar Creation This paper introduces a morphable diffusion model that enables consistent and controllable novel view synthesis of humans from a single image. The model integrates a 3D morphable model into a state-of-the-art multi-view-consistent diffusion approach, allowing for the generation of 3D consistent and photo-realistic images from novel viewpoints. The method is capable of creating fully 3D-consistent, animatable, and photorealistic human avatars from a single image of an unseen subject. The model is trained using a shuffled training scheme that enables the generation of new facial expressions for an unseen subject from a single image. The model is evaluated on both novel view and novel expression synthesis tasks, and the results show that the proposed method outperforms existing state-of-the-art avatar creation models. The code for the project is publicly available. The method is based on the recent multi-view consistent diffusion model, and it improves the reconstruction quality and allows for explicit manipulation of the synthesized images. The model is trained using a 3D morphable model that performs an uplifting of the noisy image features and associates them with the corresponding mesh vertices in 3D space. The model is evaluated on the FaceScape and THuman 2.0 datasets, and the results show that the proposed method produces more realistic face images with more accurate facial expressions and better resemblance. The method is also evaluated on the novel facial expression synthesis task, and the results show that the proposed method outperforms the baselines on all metrics. The method is limited in resolution to 256x256 due to the components proposed in previous works. The method is also limited in generalization to out-of-distribution camera parameters and ethnicities. The method is capable of maintaining 3D consistency of the generated imagery and preserving target facial expression, but struggles with the hairstyle reconstruction and out-of-distribution ethnicities due to the strong bias of the training data towards hair caps and Asian subjects. The method is also limited in the ability to generalize to out-of-distribution camera parameters that are significantly different from the ones used during training. The method is capable of generating realistic facial images under novel expressions from various views while retaining high visual quality, with input image in any facial expression. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and photo-realistic images from novel viewpoints. The method is capable of generating 3D consistent and
Reach us at info@study.space
[slides and audio] Morphable Diffusion%3A 3D-Consistent Diffusion for Single-image Avatar Creation