Understanding PIFu%3A Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

PIFu is a pixel-aligned implicit function for high-resolution clothed human digitization. It allows the recovery of high-resolution 3D textured surfaces from a single input image. The method can digitize intricate variations in clothing, such as wrinkled skirts and high-heels, including complex hairstyles. The shape and textures can be fully recovered, including largely unseen regions such as the back of the subject. PIFu can also be naturally extended to multi-view input images. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. It is memory efficient, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. PIFu extends naturally to arbitrary number of views. The method achieves state-of-the-art performance on a public benchmark and outperforms prior work for clothed human digitization from a single image. The project website can be found at https://shunsukesaito.github.io/PIFu/. PIFu introduces a memory efficient and spatially-aligned 3D representation for 3D surfaces. It defines a surface as a level set of a function f, which allows the learned functions to preserve the local detail present in the image. The continuous nature of PIFu allows generating detailed geometry with arbitrary topology in a memory efficient manner. PIFu can be cast as a general framework that can be extended to various co-domains such as RGB colors. The method can handle single-view and multi-view input naturally, which allows producing even higher fidelity results when more views are available. PIFu enables direct prediction of RGB colors on the surface geometry by defining s in Eq. 1 as an RGB vector field instead of a scalar field. This supports texturing of shapes with arbitrary topology and self-occlusion. PIFu can be extended to multi-view inputs by decomposing the implicit function f into a feature embedding function f1 and a multi-view reasoning function f2. The first function f1 computes a feature embedding from each view in the 3D world coordinate system, which allows aggregation from arbitrary views. The second function f2 takes aggregated feature vector to make a more informed 3D surface and texture prediction. PIFu achieves state-of-the-art reconstruction qualitatively and quantitatively in our metrics. The method can handle arbitrary additional views, making it particularly suitable for practical and efficient 3D modeling. The method is the first approach that can inpaint textures for shapes of arbitrary topology. It can generate textured 3D surfaces of a clothed person from a single RGB camera, moving closer toward monocular reconstructions of dynamic scenes from video without the need of a template model. The method can handle arbitrary additional views, making it particularly suitable for practical and efficient 3D modeling.PIFu is a pixel-aligned implicit function for high-resolution clothed human digitization. It allows the recovery of high-resolution 3D textured surfaces from a single input image. The method can digitize intricate variations in clothing, such as wrinkled skirts and high-heels, including complex hairstyles. The shape and textures can be fully recovered, including largely unseen regions such as the back of the subject. PIFu can also be naturally extended to multi-view input images. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. It is memory efficient, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. PIFu extends naturally to arbitrary number of views. The method achieves state-of-the-art performance on a public benchmark and outperforms prior work for clothed human digitization from a single image. The project website can be found at https://shunsukesaito.github.io/PIFu/. PIFu introduces a memory efficient and spatially-aligned 3D representation for 3D surfaces. It defines a surface as a level set of a function f, which allows the learned functions to preserve the local detail present in the image. The continuous nature of PIFu allows generating detailed geometry with arbitrary topology in a memory efficient manner. PIFu can be cast as a general framework that can be extended to various co-domains such as RGB colors. The method can handle single-view and multi-view input naturally, which allows producing even higher fidelity results when more views are available. PIFu enables direct prediction of RGB colors on the surface geometry by defining s in Eq. 1 as an RGB vector field instead of a scalar field. This supports texturing of shapes with arbitrary topology and self-occlusion. PIFu can be extended to multi-view inputs by decomposing the implicit function f into a feature embedding function f1 and a multi-view reasoning function f2. The first function f1 computes a feature embedding from each view in the 3D world coordinate system, which allows aggregation from arbitrary views. The second function f2 takes aggregated feature vector to make a more informed 3D surface and texture prediction. PIFu achieves state-of-the-art reconstruction qualitatively and quantitatively in our metrics. The method can handle arbitrary additional views, making it particularly suitable for practical and efficient 3D modeling. The method is the first approach that can inpaint textures for shapes of arbitrary topology. It can generate textured 3D surfaces of a clothed person from a single RGB camera, moving closer toward monocular reconstructions of dynamic scenes from video without the need of a template model. The method can handle arbitrary additional views, making it particularly suitable for practical and efficient 3D modeling.

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

3 Dec 2019 | Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li