19 Feb 2024 | CHRISTIAN REISER, University of Tübingen, Tübingen AI Center, Google Research, Germany
STEPHAN GARBIN, Google Research, United Kingdom
PRATUL P. SRINIVASAN, Google Research, United States of America
DOR VERBIN, Google Research, United States of America
RICHARD SZELISKI, Google Research, United States of America
BEN MILDENHALL, Google Research, United States of America
JONATHAN T. BARRON, Google Research, United States of America
PETER HEDMAN*, Google Research, United Kingdom
ANDREAS GEIGER*, University of Tübingen, Tübingen AI Center, Germany
The paper presents a method for mesh-based view synthesis that captures fine geometric details, such as leaves, branches, and grass, from multi-view images. The method uses a binary opacity grid representation instead of a continuous density field, allowing opacity values to transition discontinuously from zero to one at the surface. Multiple rays are cast per pixel to accurately model occlusion boundaries and subpixel structures without using semi-transparent voxels. The binary entropy of the opacity values is minimized to encourage them to binarize towards the end of training, facilitating the extraction of surface geometry. The recovered binary opacity grid is converted into a triangle mesh using a fusion-based strategy, followed by mesh simplification and appearance model fitting. The resulting meshes are compact enough for real-time rendering on mobile devices and achieve significantly higher view synthesis quality compared to existing mesh-based approaches. The method outperforms state-of-the-art volume-based methods in terms of quality and speed, while being more efficient than surface-based methods. The paper also discusses the limitations and future work, including the need for further improvements in indoor scenes and the potential of concurrent work in UV mapping for high-detail meshes.The paper presents a method for mesh-based view synthesis that captures fine geometric details, such as leaves, branches, and grass, from multi-view images. The method uses a binary opacity grid representation instead of a continuous density field, allowing opacity values to transition discontinuously from zero to one at the surface. Multiple rays are cast per pixel to accurately model occlusion boundaries and subpixel structures without using semi-transparent voxels. The binary entropy of the opacity values is minimized to encourage them to binarize towards the end of training, facilitating the extraction of surface geometry. The recovered binary opacity grid is converted into a triangle mesh using a fusion-based strategy, followed by mesh simplification and appearance model fitting. The resulting meshes are compact enough for real-time rendering on mobile devices and achieve significantly higher view synthesis quality compared to existing mesh-based approaches. The method outperforms state-of-the-art volume-based methods in terms of quality and speed, while being more efficient than surface-based methods. The paper also discusses the limitations and future work, including the need for further improvements in indoor scenes and the potential of concurrent work in UV mapping for high-detail meshes.