GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

29 Apr 2021 | Michael Niemeyer, Andreas Geiger
GIRAFFE is a novel method for controllable image synthesis that represents scenes as compositional generative neural feature fields. The approach incorporates a 3D scene representation into the generative model, enabling more controllable image synthesis. By learning from unstructured and unposed image collections without additional supervision, the model disentangles individual objects from the background as well as their shapes and appearances. Combining this scene representation with a neural rendering pipeline yields fast and realistic image synthesis. The model allows for translating and rotating objects in the scene as well as changing the camera pose. Experiments show that the model can generate scenes with more objects than were present in the training data and allows for complex operations like circular translations and adding more objects at test time. The model achieves similar or better results compared to baseline methods in terms of image quality, as measured by the Frechet Inception Distance (FID) score. The model's ability to disentangle objects from the background enables independent control of objects without affecting the background. The model also demonstrates faster rendering speeds compared to previous methods. The key contributions of the work include the use of a compositional 3D scene representation and the integration of a neural rendering pipeline for efficient and realistic image synthesis. The method is trained from raw image collections without additional supervision, making it suitable for a wide range of applications.GIRAFFE is a novel method for controllable image synthesis that represents scenes as compositional generative neural feature fields. The approach incorporates a 3D scene representation into the generative model, enabling more controllable image synthesis. By learning from unstructured and unposed image collections without additional supervision, the model disentangles individual objects from the background as well as their shapes and appearances. Combining this scene representation with a neural rendering pipeline yields fast and realistic image synthesis. The model allows for translating and rotating objects in the scene as well as changing the camera pose. Experiments show that the model can generate scenes with more objects than were present in the training data and allows for complex operations like circular translations and adding more objects at test time. The model achieves similar or better results compared to baseline methods in terms of image quality, as measured by the Frechet Inception Distance (FID) score. The model's ability to disentangle objects from the background enables independent control of objects without affecting the background. The model also demonstrates faster rendering speeds compared to previous methods. The key contributions of the work include the use of a compositional 3D scene representation and the integration of a neural rendering pipeline for efficient and realistic image synthesis. The method is trained from raw image collections without additional supervision, making it suitable for a wide range of applications.
Reach us at info@study.space