L4GM: Large 4D Gaussian Reconstruction Model

L4GM: Large 4D Gaussian Reconstruction Model

14 Jun 2024 | Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling
L4GM is a novel 4D Large Reconstruction Model that generates animated 3D objects from a single-view video input in a single feed-forward pass, taking only a second. The model is based on a pre-trained 3D Large Reconstruction Model (LGM) that outputs 3D Gaussian ellipsoids from multiview images. L4GM extends LGM to take a sequence of frames as input and produce a 3D Gaussian representation for each frame. It adds temporal self-attention layers to learn temporal consistency and uses a per-timestep multiview rendering loss for training. The model is trained on a large-scale dataset of 12 million multiview videos of rendered animated 3D objects from Objaverse 1.0. L4GM is capable of reconstructing long videos and using learned interpolation to achieve high framerates. It outperforms existing video-to-4D generation approaches on all quality metrics by a significant margin while being 100 to 1,000 times faster. L4GM also enables fast video-to-4D generation in combination with a multiview generative model. The model generalizes well to in-the-wild videos, producing high-quality animated 3D assets.L4GM is a novel 4D Large Reconstruction Model that generates animated 3D objects from a single-view video input in a single feed-forward pass, taking only a second. The model is based on a pre-trained 3D Large Reconstruction Model (LGM) that outputs 3D Gaussian ellipsoids from multiview images. L4GM extends LGM to take a sequence of frames as input and produce a 3D Gaussian representation for each frame. It adds temporal self-attention layers to learn temporal consistency and uses a per-timestep multiview rendering loss for training. The model is trained on a large-scale dataset of 12 million multiview videos of rendered animated 3D objects from Objaverse 1.0. L4GM is capable of reconstructing long videos and using learned interpolation to achieve high framerates. It outperforms existing video-to-4D generation approaches on all quality metrics by a significant margin while being 100 to 1,000 times faster. L4GM also enables fast video-to-4D generation in combination with a multiview generative model. The model generalizes well to in-the-wild videos, producing high-quality animated 3D assets.
Reach us at info@study.space
[slides] L4GM%3A Large 4D Gaussian Reconstruction Model | StudySpace