1 Apr 2024 | Zizhang Li, Dor Livtak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu
The paper introduces 3D-Fauna, a method that learns a pan-category deformable 3D model of over 100 animal species using only 2D internet images as training data. The model can reconstruct articulated, textured 3D meshes from single images of quadruped animals in a feed-forward manner, enabling animation and rendering. The key challenge is the limited availability of training data for rare species, which is addressed by introducing the Semantic Bank of Skinned Models (SBSM). This method automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge from an off-the-shelf self-supervised feature extractor. A new large-scale dataset of diverse animal species is also contributed. At inference time, the model reconstructs an articulated 3D mesh in seconds from a single image. The method outperforms existing approaches in quantitative and qualitative comparisons, demonstrating significant improvements in reconstructing diverse animal species. The model is trained using an unsupervised approach with self-supervised image features and object masks. The pipeline includes reconstruction losses, correspondence from self-supervised features, and a mask discriminator to ensure realistic shapes from all viewpoints. The model is evaluated on multiple datasets, showing improved performance over existing methods. The results indicate that 3D-Fauna is a significant advancement in learning 3D models of animals from internet images.The paper introduces 3D-Fauna, a method that learns a pan-category deformable 3D model of over 100 animal species using only 2D internet images as training data. The model can reconstruct articulated, textured 3D meshes from single images of quadruped animals in a feed-forward manner, enabling animation and rendering. The key challenge is the limited availability of training data for rare species, which is addressed by introducing the Semantic Bank of Skinned Models (SBSM). This method automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge from an off-the-shelf self-supervised feature extractor. A new large-scale dataset of diverse animal species is also contributed. At inference time, the model reconstructs an articulated 3D mesh in seconds from a single image. The method outperforms existing approaches in quantitative and qualitative comparisons, demonstrating significant improvements in reconstructing diverse animal species. The model is trained using an unsupervised approach with self-supervised image features and object masks. The pipeline includes reconstruction losses, correspondence from self-supervised features, and a mask discriminator to ensure realistic shapes from all viewpoints. The model is evaluated on multiple datasets, showing improved performance over existing methods. The results indicate that 3D-Fauna is a significant advancement in learning 3D models of animals from internet images.