18 Mar 2024 | Sungphill Moon*, Hyeontae Son*, Dongcheol Hur, Sangwook Kim
GenFlow is a method for 6D pose refinement of novel objects, leveraging 3D shape information to improve accuracy and generalization. The method estimates optical flow between a rendered image and an observed image, iteratively refining the 6D pose. It uses a shape-constrained recurrent flow framework, incorporating a differentiable PnP solver and correlation lookup based on pose-induced flow. A cascade network architecture is designed to exploit multi-scale correlations and coarse-to-fine refinement, enhancing performance. GenFlow outperforms existing methods on unseen object pose estimation benchmarks, achieving competitive results for seen objects without fine-tuning. The method is trained on a large synthetic dataset and validated on multiple datasets from the BOP challenge. It demonstrates state-of-the-art performance in 6D pose estimation for both RGB and RGB-D inputs. The method's effectiveness is supported by ablation studies showing the importance of shape constraints, confidence factorization, and cascade architecture in improving pose estimation accuracy.GenFlow is a method for 6D pose refinement of novel objects, leveraging 3D shape information to improve accuracy and generalization. The method estimates optical flow between a rendered image and an observed image, iteratively refining the 6D pose. It uses a shape-constrained recurrent flow framework, incorporating a differentiable PnP solver and correlation lookup based on pose-induced flow. A cascade network architecture is designed to exploit multi-scale correlations and coarse-to-fine refinement, enhancing performance. GenFlow outperforms existing methods on unseen object pose estimation benchmarks, achieving competitive results for seen objects without fine-tuning. The method is trained on a large synthetic dataset and validated on multiple datasets from the BOP challenge. It demonstrates state-of-the-art performance in 6D pose estimation for both RGB and RGB-D inputs. The method's effectiveness is supported by ablation studies showing the importance of shape constraints, confidence factorization, and cascade architecture in improving pose estimation accuracy.