GIRAFFE is a novel method for controllable image synthesis that represents scenes as compositional generative neural feature fields. The method allows for disentangling individual objects from the background and their shapes and appearances without explicit supervision. By incorporating a 3D scene representation into the generative model, GIRAFFE enables more controllable image synthesis. The approach combines a neural rendering pipeline with the scene representation to achieve fast and realistic image synthesis. The model is trained on raw, unstructured image collections and can generate scenes with more objects than present in the training data. It allows for translating and rotating objects, changing camera poses, and generating novel scenes. The model outperforms existing methods in terms of image quality and controllability, as demonstrated by experiments on various datasets. The method is able to generalize beyond the training data and handle complex, real-world scenes with cluttered backgrounds. The key contributions include the use of a compositional 3D scene representation and the integration of a neural rendering pipeline for efficient and realistic image synthesis. The model is trained using a non-saturating GAN objective and R1 gradient penalty to ensure stable training. The results show that GIRAFFE achieves high-quality images and allows for fine-grained control over the generated scenes.GIRAFFE is a novel method for controllable image synthesis that represents scenes as compositional generative neural feature fields. The method allows for disentangling individual objects from the background and their shapes and appearances without explicit supervision. By incorporating a 3D scene representation into the generative model, GIRAFFE enables more controllable image synthesis. The approach combines a neural rendering pipeline with the scene representation to achieve fast and realistic image synthesis. The model is trained on raw, unstructured image collections and can generate scenes with more objects than present in the training data. It allows for translating and rotating objects, changing camera poses, and generating novel scenes. The model outperforms existing methods in terms of image quality and controllability, as demonstrated by experiments on various datasets. The method is able to generalize beyond the training data and handle complex, real-world scenes with cluttered backgrounds. The key contributions include the use of a compositional 3D scene representation and the integration of a neural rendering pipeline for efficient and realistic image synthesis. The model is trained using a non-saturating GAN objective and R1 gradient penalty to ensure stable training. The results show that GIRAFFE achieves high-quality images and allows for fine-grained control over the generated scenes.