30 Mar 2021 | Katja Schwarz*, Yiyi Liao*, Michael Niemeyer, Andreas Geiger
This paper introduces Generative Radiance Fields (GRAF) for high-resolution 3D-aware image synthesis. GRAF is a generative model that represents scenes as continuous functions, enabling high-resolution image synthesis while maintaining 3D consistency. Unlike voxel-based approaches, which are limited by discretization and suffer from artifacts, GRAF uses a continuous representation that allows for disentangling camera and scene properties, and gracefully degrades in the presence of reconstruction ambiguity. The model is trained using unposed 2D images and includes a multi-scale patch-based discriminator to efficiently learn high-resolution generative radiance fields. The approach is evaluated on both synthetic and real-world datasets, showing that GRAF produces high-fidelity images with strong 3D consistency, outperforming state-of-the-art methods in terms of visual fidelity and generalization to high spatial resolutions. The model allows for controlling both shape and appearance of generated objects, and is trained without 3D supervision. The paper also discusses the implications of learned projections and the importance of multi-scale discriminators in achieving high-quality results. The results demonstrate that GRAF can generate high-resolution images with better multi-view consistency compared to voxel-based approaches, although the current results are limited to simple scenes with single objects. The authors suggest that incorporating inductive biases such as depth maps or symmetry could extend the model to more complex real-world scenarios. The paper also addresses the broader impact of 3D-aware generative models, highlighting their potential applications in virtual reality, data augmentation, and robotics, while also acknowledging the risks associated with generating photorealistic 3D scenarios, such as the potential for manipulation and misleading content.This paper introduces Generative Radiance Fields (GRAF) for high-resolution 3D-aware image synthesis. GRAF is a generative model that represents scenes as continuous functions, enabling high-resolution image synthesis while maintaining 3D consistency. Unlike voxel-based approaches, which are limited by discretization and suffer from artifacts, GRAF uses a continuous representation that allows for disentangling camera and scene properties, and gracefully degrades in the presence of reconstruction ambiguity. The model is trained using unposed 2D images and includes a multi-scale patch-based discriminator to efficiently learn high-resolution generative radiance fields. The approach is evaluated on both synthetic and real-world datasets, showing that GRAF produces high-fidelity images with strong 3D consistency, outperforming state-of-the-art methods in terms of visual fidelity and generalization to high spatial resolutions. The model allows for controlling both shape and appearance of generated objects, and is trained without 3D supervision. The paper also discusses the implications of learned projections and the importance of multi-scale discriminators in achieving high-quality results. The results demonstrate that GRAF can generate high-resolution images with better multi-view consistency compared to voxel-based approaches, although the current results are limited to simple scenes with single objects. The authors suggest that incorporating inductive biases such as depth maps or symmetry could extend the model to more complex real-world scenarios. The paper also addresses the broader impact of 3D-aware generative models, highlighting their potential applications in virtual reality, data augmentation, and robotics, while also acknowledging the risks associated with generating photorealistic 3D scenarios, such as the potential for manipulation and misleading content.