[slides and audio] Era3D%3A High-Resolution Multiview Diffusion using Efficient Row-wise Attention

Era3D is a novel multiview diffusion method designed to generate high-resolution multiview images from a single-view input image. The method addresses the limitations of existing multiview generation techniques, such as camera prior mismatch, inefficiency, and low resolution, which often result in poor-quality multiview images. Era3D introduces a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, allowing it to generate images without shape distortions. Additionally, a row-wise attention layer is used to enforce epipolar priors, facilitating efficient cross-view information fusion. This approach reduces computational complexity and memory consumption while maintaining high-quality multiview image generation. Comprehensive experiments demonstrate that Era3D can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, outperforming baseline methods in terms of quality and efficiency. The method is evaluated on various datasets, including the Google Scanned Object (GSO) dataset, and shows superior performance in novel view synthesis and 3D reconstruction tasks.Era3D is a novel multiview diffusion method designed to generate high-resolution multiview images from a single-view input image. The method addresses the limitations of existing multiview generation techniques, such as camera prior mismatch, inefficiency, and low resolution, which often result in poor-quality multiview images. Era3D introduces a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, allowing it to generate images without shape distortions. Additionally, a row-wise attention layer is used to enforce epipolar priors, facilitating efficient cross-view information fusion. This approach reduces computational complexity and memory consumption while maintaining high-quality multiview image generation. Comprehensive experiments demonstrate that Era3D can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, outperforming baseline methods in terms of quality and efficiency. The method is evaluated on various datasets, including the Google Scanned Object (GSO) dataset, and shows superior performance in novel view synthesis and 3D reconstruction tasks.

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

29 May 2024 | Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo