GALA: Generating Animatable Layered Assets from a Single Scan

GALA: Generating Animatable Layered Assets from a Single Scan

23 Jan 2024 | Taeksoo Kim*, Byungjun Kim*, Shunsuke Saito, Hanbyul Joo
GALA is a framework that takes a single-layer 3D mesh of a clothed human and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars in any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry, overlooking the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, the authors propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. They first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then they synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once they complete inpainting high-fidelity 3D geometry, they also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, they obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. The authors demonstrate the effectiveness of their approach for decomposition, canonicalization, and composition tasks compared to existing solutions. They also show that the proposed pose-guided SDS enables robust canonicalization even for challenging cases, outperforming existing methods. Lastly, they show garment transfer to create novel avatars only from a collection of single-layer clothed humans. Their contributions include proposing a new task of multi-layer decomposition and composition from a single-layer scan, presenting a pose-guided SDS loss for robust modeling of layered clothed humans in a canonical space for garment transfer and reposing from a single scan, and providing a comprehensive analysis of generating animatable layered assets from a single scan with a newly established evaluation protocol. They will release code for benchmarking future research on this novel task.GALA is a framework that takes a single-layer 3D mesh of a clothed human and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars in any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry, overlooking the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, the authors propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. They first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then they synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once they complete inpainting high-fidelity 3D geometry, they also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, they obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. The authors demonstrate the effectiveness of their approach for decomposition, canonicalization, and composition tasks compared to existing solutions. They also show that the proposed pose-guided SDS enables robust canonicalization even for challenging cases, outperforming existing methods. Lastly, they show garment transfer to create novel avatars only from a collection of single-layer clothed humans. Their contributions include proposing a new task of multi-layer decomposition and composition from a single-layer scan, presenting a pose-guided SDS loss for robust modeling of layered clothed humans in a canonical space for garment transfer and reposing from a single scan, and providing a comprehensive analysis of generating animatable layered assets from a single scan with a newly established evaluation protocol. They will release code for benchmarking future research on this novel task.
Reach us at info@study.space
[slides] GALA%3A Generating Animatable Layered Assets from a Single Scan | StudySpace