Deep Convolutional Inverse Graphics Network

Deep Convolutional Inverse Graphics Network

22 Jun 2015 | Tejas D. Kulkarni*, Will Whitney*, Pushmeet Kohli, Joshua B. Tenenbaum
The paper introduces the Deep Convolutional Inverse Graphics Network (DC-IGN), a model designed to learn an interpretable representation of images that is disentangled with respect to transformations such as out-of-plane rotations and lighting variations. The DC-IGN is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. The model is trained to encourage neurons in the *graphics code* layer to represent specific transformations, such as pose or light, by using a training procedure that varies only one transformation at a time. Given a single input image, the model can generate new images with variations in pose and lighting. The paper presents qualitative and quantitative results demonstrating the model's efficacy in learning a 3D rendering engine. The authors also discuss related work and propose future directions for extending the model to handle more complex scenes and real-world visual scenes with moving objects.The paper introduces the Deep Convolutional Inverse Graphics Network (DC-IGN), a model designed to learn an interpretable representation of images that is disentangled with respect to transformations such as out-of-plane rotations and lighting variations. The DC-IGN is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. The model is trained to encourage neurons in the *graphics code* layer to represent specific transformations, such as pose or light, by using a training procedure that varies only one transformation at a time. Given a single input image, the model can generate new images with variations in pose and lighting. The paper presents qualitative and quantitative results demonstrating the model's efficacy in learning a 3D rendering engine. The authors also discuss related work and propose future directions for extending the model to handle more complex scenes and real-world visual scenes with moving objects.
Reach us at info@study.space