11 Mar 2024 | Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, and Maurice Fallon
SiLVR is a scalable lidar-visual reconstruction system that integrates neural radiance fields (NeRF) with lidar data to generate high-quality, geometrically accurate 3D reconstructions with photorealistic textures. The system fuses lidar and visual data to overcome the limitations of each sensor individually, such as the sparsity of lidar data and the fragility of vision in textureless areas. It uses a neural field-based approach that incorporates geometric constraints from lidar data, including depth and surface normals, to improve reconstruction quality. The system also employs submapping to scale to large-scale environments, enabling reconstruction over long trajectories. The method is demonstrated using data from a multi-camera, lidar sensor suite onboard a legged robot, a drone, and a handheld device in industrial and urban environments. The system is evaluated on real-world large-scale outdoor datasets captured from multiple robotic platforms. Key contributions include a dense textured 3D reconstruction system that achieves accurate geometry comparable to lidar and photorealistic novel view synthesis, integration with a lidar SLAM system for metric-scale trajectory with reduced computation time, and a sub-mapping system that scales to large outdoor environments. The system also incorporates depth and surface normal regularisation from lidar measurements, improving reconstruction quality in textureless areas. The method is compared with other approaches, showing superior performance in terms of accuracy, completeness, and visual quality. The system is shown to produce more complete and accurate reconstructions when using multiple cameras, and it is effective in large-scale environments with submapping. The results demonstrate that the system achieves high-quality 3D reconstructions with improved geometry and visual quality compared to vision-only and lidar-only methods.SiLVR is a scalable lidar-visual reconstruction system that integrates neural radiance fields (NeRF) with lidar data to generate high-quality, geometrically accurate 3D reconstructions with photorealistic textures. The system fuses lidar and visual data to overcome the limitations of each sensor individually, such as the sparsity of lidar data and the fragility of vision in textureless areas. It uses a neural field-based approach that incorporates geometric constraints from lidar data, including depth and surface normals, to improve reconstruction quality. The system also employs submapping to scale to large-scale environments, enabling reconstruction over long trajectories. The method is demonstrated using data from a multi-camera, lidar sensor suite onboard a legged robot, a drone, and a handheld device in industrial and urban environments. The system is evaluated on real-world large-scale outdoor datasets captured from multiple robotic platforms. Key contributions include a dense textured 3D reconstruction system that achieves accurate geometry comparable to lidar and photorealistic novel view synthesis, integration with a lidar SLAM system for metric-scale trajectory with reduced computation time, and a sub-mapping system that scales to large outdoor environments. The system also incorporates depth and surface normal regularisation from lidar measurements, improving reconstruction quality in textureless areas. The method is compared with other approaches, showing superior performance in terms of accuracy, completeness, and visual quality. The system is shown to produce more complete and accurate reconstructions when using multiple cameras, and it is effective in large-scale environments with submapping. The results demonstrate that the system achieves high-quality 3D reconstructions with improved geometry and visual quality compared to vision-only and lidar-only methods.