CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

July 2024 | HAO-YANG PENG, BNRist, Department of Computer Science and Technology, Tsinghua University, China; JIA-PENG ZHANG, Zhili College, Tsinghua University, China; MENG-HAO GUO, BNRIst, Department of Computer Science and Technology, Tsinghua University, China; YAN-PEI CAO, VAST, China; SHI-MIN HU, BNRIst, Department of Computer Science and Technology, Tsinghua University, China
CharacterGen is an efficient 3D character generation framework that takes a single input image and generates high-quality 3D character meshes in a canonical pose, suitable for downstream rigging and animation workflows. The framework introduces a streamlined generation pipeline and an image-conditioned multi-view diffusion model to calibrate input poses to a canonical form while retaining key attributes of the input image. A transformer-based, generalizable sparse-view reconstruction model is used to create detailed 3D models from multi-view images. The approach also includes a texture-back-projection strategy to produce high-quality texture maps. A curated dataset of 13,746 anime characters, rendered in multiple poses and views, is used for training and evaluation. The method effectively addresses challenges such as self-occlusion and pose ambiguity, and has been shown to generate 3D characters with high-quality shapes and textures. The paper discusses related works, presents the method in detail, and includes experiments to evaluate the effectiveness and efficiency of CharacterGen.CharacterGen is an efficient 3D character generation framework that takes a single input image and generates high-quality 3D character meshes in a canonical pose, suitable for downstream rigging and animation workflows. The framework introduces a streamlined generation pipeline and an image-conditioned multi-view diffusion model to calibrate input poses to a canonical form while retaining key attributes of the input image. A transformer-based, generalizable sparse-view reconstruction model is used to create detailed 3D models from multi-view images. The approach also includes a texture-back-projection strategy to produce high-quality texture maps. A curated dataset of 13,746 anime characters, rendered in multiple poses and views, is used for training and evaluation. The method effectively addresses challenges such as self-occlusion and pose ambiguity, and has been shown to generate 3D characters with high-quality shapes and textures. The paper discusses related works, presents the method in detail, and includes experiments to evaluate the effectiveness and efficiency of CharacterGen.
Reach us at info@study.space
[slides and audio] CharacterGen%3A Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization