July 2024 | HAO-YANG PENG, JIA-PENG ZHANG, MENG-HAO GUO, YAN-PEI CAO, SHI-MIN HU
CharacterGen is a novel framework for efficiently generating 3D characters from single images with multi-view pose canonicalization. The framework takes a single input image and generates high-quality 3D character meshes in a canonical pose with consistent appearance, suitable for downstream rigging and animation workflows. The method addresses challenges such as self-occlusion and pose ambiguity by canonicalizing input poses and ensuring image consistency across multiple views. It combines an image-conditioned multi-view diffusion model with a transformer-based sparse-view reconstruction model, along with a texture-back-projection strategy to produce high-quality texture maps. A curated dataset of 13,746 anime characters, rendered in multiple poses and views, is used for training and evaluation. The framework has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures. The method is efficient, with the entire generation process taking less than 1 minute. CharacterGen is particularly effective for stylized characters, which are common in anime and other stylized domains. The framework is designed to be compatible with downstream applications such as rigging and animation. The method is supported by two key insights: the use of established principles from recent advancements in controllable image generation and the ability to overcome challenges associated with sparse-view reconstruction for 3D characters. The framework is evaluated on both 2D multi-view generation and 3D character generation tasks, demonstrating its effectiveness and efficiency. The results show that CharacterGen can generate high-quality 3D characters with consistent appearance and texture, suitable for various downstream applications. The method is also compared with other image-prompt 3D generation methods, showing its superior performance in terms of generation quality and consistency. The framework is also evaluated through user studies and quantitative experiments, demonstrating its robustness and effectiveness. The method is particularly effective for generating A-pose 3D characters, which are widely used in animation and other applications. The framework is also evaluated for its ability to generate 3D characters from non-A-pose input images, showing its versatility and effectiveness. The method is supported by a comprehensive dataset of 13,746 anime characters, which is used for training and evaluation. The framework is also evaluated for its ability to generate 3D characters with detailed texture maps, which is essential for downstream applications such as rigging and animation. The method is also evaluated for its ability to generate 3D characters with high-quality geometry and texture, which is essential for downstream applications such as animation and virtual reality. The framework is also evaluated for its ability to generate 3D characters with consistent appearance and texture, which is essential for downstream applications such as animation and virtual reality. The method is also evaluated for its ability to generate 3D characters with high-quality geometry and texture, which is essential for downstream applications such as animation and virtual reality. The framework is also evaluated for its ability to generate 3D characters with consistent appearance andCharacterGen is a novel framework for efficiently generating 3D characters from single images with multi-view pose canonicalization. The framework takes a single input image and generates high-quality 3D character meshes in a canonical pose with consistent appearance, suitable for downstream rigging and animation workflows. The method addresses challenges such as self-occlusion and pose ambiguity by canonicalizing input poses and ensuring image consistency across multiple views. It combines an image-conditioned multi-view diffusion model with a transformer-based sparse-view reconstruction model, along with a texture-back-projection strategy to produce high-quality texture maps. A curated dataset of 13,746 anime characters, rendered in multiple poses and views, is used for training and evaluation. The framework has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures. The method is efficient, with the entire generation process taking less than 1 minute. CharacterGen is particularly effective for stylized characters, which are common in anime and other stylized domains. The framework is designed to be compatible with downstream applications such as rigging and animation. The method is supported by two key insights: the use of established principles from recent advancements in controllable image generation and the ability to overcome challenges associated with sparse-view reconstruction for 3D characters. The framework is evaluated on both 2D multi-view generation and 3D character generation tasks, demonstrating its effectiveness and efficiency. The results show that CharacterGen can generate high-quality 3D characters with consistent appearance and texture, suitable for various downstream applications. The method is also compared with other image-prompt 3D generation methods, showing its superior performance in terms of generation quality and consistency. The framework is also evaluated through user studies and quantitative experiments, demonstrating its robustness and effectiveness. The method is particularly effective for generating A-pose 3D characters, which are widely used in animation and other applications. The framework is also evaluated for its ability to generate 3D characters from non-A-pose input images, showing its versatility and effectiveness. The method is supported by a comprehensive dataset of 13,746 anime characters, which is used for training and evaluation. The framework is also evaluated for its ability to generate 3D characters with detailed texture maps, which is essential for downstream applications such as rigging and animation. The method is also evaluated for its ability to generate 3D characters with high-quality geometry and texture, which is essential for downstream applications such as animation and virtual reality. The framework is also evaluated for its ability to generate 3D characters with consistent appearance and texture, which is essential for downstream applications such as animation and virtual reality. The method is also evaluated for its ability to generate 3D characters with high-quality geometry and texture, which is essential for downstream applications such as animation and virtual reality. The framework is also evaluated for its ability to generate 3D characters with consistent appearance and