Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

29 May 2024 | Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo
Era3D is a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. It addresses the challenges of camera prior mismatch, inefficiency, and low resolution in existing methods. Era3D introduces a camera prediction module to estimate focal length and elevation, enabling distortion-free image generation. It also employs a row-wise attention mechanism to efficiently fuse cross-view information, significantly reducing computational complexity. Compared to state-of-the-art methods, Era3D generates high-quality multiview images with up to 512x512 resolution while reducing computation by 12 times. Comprehensive experiments show that Era3D can reconstruct high-quality 3D meshes from diverse single-view inputs, outperforming baseline methods. The method achieves state-of-the-art performance in single-view 3D generation. Era3D's key contributions include solving distortion artifacts from inconsistent camera intrinsics, designing a novel regression and condition scheme for arbitrary camera inputs, and proposing row-wise multiview attention for efficient high-resolution generation. The method is evaluated on multiple datasets and shows superior performance in terms of reconstruction quality, pose estimation accuracy, and computational efficiency. Era3D's approach enables efficient and scalable multiview diffusion for high-resolution 3D reconstruction.Era3D is a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. It addresses the challenges of camera prior mismatch, inefficiency, and low resolution in existing methods. Era3D introduces a camera prediction module to estimate focal length and elevation, enabling distortion-free image generation. It also employs a row-wise attention mechanism to efficiently fuse cross-view information, significantly reducing computational complexity. Compared to state-of-the-art methods, Era3D generates high-quality multiview images with up to 512x512 resolution while reducing computation by 12 times. Comprehensive experiments show that Era3D can reconstruct high-quality 3D meshes from diverse single-view inputs, outperforming baseline methods. The method achieves state-of-the-art performance in single-view 3D generation. Era3D's key contributions include solving distortion artifacts from inconsistent camera intrinsics, designing a novel regression and condition scheme for arbitrary camera inputs, and proposing row-wise multiview attention for efficient high-resolution generation. The method is evaluated on multiple datasets and shows superior performance in terms of reconstruction quality, pose estimation accuracy, and computational efficiency. Era3D's approach enables efficient and scalable multiview diffusion for high-resolution 3D reconstruction.
Reach us at info@study.space