23 Jul 2024 | Cameron Smith*, David Charatan*, Ayush Tewari, and Vincent Sitzmann
FlowMap is an end-to-end differentiable method that estimates camera poses, intrinsics, and dense depth maps from a video sequence. It minimizes a simple least-squares objective by comparing optical flow induced by depth, intrinsics, and poses against correspondences obtained from off-the-shelf optical flow and point tracking. FlowMap uses differentiable re-parameterizations of depth, intrinsics, and pose that are suitable for first-order optimization. It outperforms prior gradient-descent based bundle adjustment methods and performs on par with COLMAP, the state-of-the-art SfM method, on 360° novel view synthesis. FlowMap enables photo-realistic novel view synthesis using Gaussian Splatting. It is fully differentiable and does not rely on conventional SfM techniques. FlowMap's design choices are necessary for high-quality results, and it performs well on popular real-world datasets. FlowMap is optimized per-scene with gradient descent and is compatible with standard deep learning pipelines. It uses a feed-forward approach for depth, pose, and intrinsics, and its results are comparable to COLMAP. FlowMap is robust and can handle object-centric 360° trajectories. It is a complete departure from conventional SfM and is end-to-end differentiable. FlowMap's results are comparable to COLMAP in terms of 3D Gaussian Splatting. FlowMap is efficient and can process video sequences in under 20 minutes. It is compatible with other learned methods and can be used for self-supervised learning. FlowMap's results are robust and can be used for training on unannotated, internet-scale video data. FlowMap is a simple, robust, and scalable method for estimating camera parameters from video. It outperforms existing gradient-descent based methods for estimating camera parameters. FlowMap's depth and camera parameters enable subsequent reconstruction via Gaussian Splatting of comparable quality to COLMAP. FlowMap is written in PyTorch and achieves runtimes of 3 minutes for short sequences and 20 minutes for long sequences. FlowMap is fully differentiable with respect to per-frame depth estimates and can serve as a building block for new self-supervised monocular depth estimators, deep-learning-based multi-view-geometry methods, and methods for generalizable novel view synthesis.FlowMap is an end-to-end differentiable method that estimates camera poses, intrinsics, and dense depth maps from a video sequence. It minimizes a simple least-squares objective by comparing optical flow induced by depth, intrinsics, and poses against correspondences obtained from off-the-shelf optical flow and point tracking. FlowMap uses differentiable re-parameterizations of depth, intrinsics, and pose that are suitable for first-order optimization. It outperforms prior gradient-descent based bundle adjustment methods and performs on par with COLMAP, the state-of-the-art SfM method, on 360° novel view synthesis. FlowMap enables photo-realistic novel view synthesis using Gaussian Splatting. It is fully differentiable and does not rely on conventional SfM techniques. FlowMap's design choices are necessary for high-quality results, and it performs well on popular real-world datasets. FlowMap is optimized per-scene with gradient descent and is compatible with standard deep learning pipelines. It uses a feed-forward approach for depth, pose, and intrinsics, and its results are comparable to COLMAP. FlowMap is robust and can handle object-centric 360° trajectories. It is a complete departure from conventional SfM and is end-to-end differentiable. FlowMap's results are comparable to COLMAP in terms of 3D Gaussian Splatting. FlowMap is efficient and can process video sequences in under 20 minutes. It is compatible with other learned methods and can be used for self-supervised learning. FlowMap's results are robust and can be used for training on unannotated, internet-scale video data. FlowMap is a simple, robust, and scalable method for estimating camera parameters from video. It outperforms existing gradient-descent based methods for estimating camera parameters. FlowMap's depth and camera parameters enable subsequent reconstruction via Gaussian Splatting of comparable quality to COLMAP. FlowMap is written in PyTorch and achieves runtimes of 3 minutes for short sequences and 20 minutes for long sequences. FlowMap is fully differentiable with respect to per-frame depth estimates and can serve as a building block for new self-supervised monocular depth estimators, deep-learning-based multi-view-geometry methods, and methods for generalizable novel view synthesis.