[slides] DenseFusion%3A 6D Object Pose Estimation by Iterative Dense Fusion

**DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion** **Authors:** Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese **Institution:** Department of Computer Science, Stanford University; Department of Computer Science, Shanghai Jiao Tong University **Abstract:** This paper presents DenseFusion, a generic framework for estimating 6D poses of known objects from RGB-D images. DenseFusion is designed to fully leverage both RGB and depth data sources, addressing the challenge of heavy occlusion and real-time applications. The framework consists of a heterogeneous architecture that processes RGB and depth data separately and a dense fusion network that extracts pixel-wise dense feature embeddings. An end-to-end iterative pose refinement procedure further improves the accuracy of pose estimation while maintaining near real-time inference. Experiments on the YCB-Video and LineMOD datasets demonstrate that DenseFusion outperforms state-of-the-art methods in terms of accuracy and speed. The method is also deployed on a real robot for grasping and manipulation tasks. **Contributions:** 1. A principled approach to combining color and depth information from RGB-D inputs. 2. An iterative refinement module integrated into the neural network architecture, eliminating the need for post-processing steps. **Key Contributions:** - **Dense Fusion:** Combines RGB and depth information at the per-pixel level, enabling the model to reason about local appearance and geometry. - **Iterative Refinement:** Enhances pose estimation accuracy while maintaining real-time inference speed. **Methods:** 1. **Semantic Segmentation:** Segments objects in the image using an encoder-decoder architecture. 2. **Dense Feature Extraction:** Extracts color and geometric features separately using CNNs and PointNet-like architectures. 3. **Pixel-wise Dense Fusion:** Fuses color and geometric features at the pixel level to create dense feature embeddings. 4. **Pose Estimation:** Uses a final network to predict 6D poses based on the fused features. 5. **Iterative Refinement:** Refines pose estimates iteratively using a neural network-based approach. **Experiments:** - **YCB-Video Dataset:** Evaluates performance on objects with varying shapes and textures under different occlusion conditions. - **LineMOD Dataset:** Compares with state-of-the-art methods and demonstrates robustness to occlusions. - **Robotic Grasping:** Evaluates the accuracy of estimated poses for real-world robotic tasks. **Results:** - DenseFusion outperforms state-of-the-art methods in both datasets. - The iterative refinement module significantly improves pose estimation accuracy. - The method is significantly faster than existing methods, making it suitable for real-time applications. **Conclusion:** DenseFusion provides a robust and efficient solution for 6D object pose estimation from RGB-D images, demonstrating superior performance in various challenging**DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion** **Authors:** Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese **Institution:** Department of Computer Science, Stanford University; Department of Computer Science, Shanghai Jiao Tong University **Abstract:** This paper presents DenseFusion, a generic framework for estimating 6D poses of known objects from RGB-D images. DenseFusion is designed to fully leverage both RGB and depth data sources, addressing the challenge of heavy occlusion and real-time applications. The framework consists of a heterogeneous architecture that processes RGB and depth data separately and a dense fusion network that extracts pixel-wise dense feature embeddings. An end-to-end iterative pose refinement procedure further improves the accuracy of pose estimation while maintaining near real-time inference. Experiments on the YCB-Video and LineMOD datasets demonstrate that DenseFusion outperforms state-of-the-art methods in terms of accuracy and speed. The method is also deployed on a real robot for grasping and manipulation tasks. **Contributions:** 1. A principled approach to combining color and depth information from RGB-D inputs. 2. An iterative refinement module integrated into the neural network architecture, eliminating the need for post-processing steps. **Key Contributions:** - **Dense Fusion:** Combines RGB and depth information at the per-pixel level, enabling the model to reason about local appearance and geometry. - **Iterative Refinement:** Enhances pose estimation accuracy while maintaining real-time inference speed. **Methods:** 1. **Semantic Segmentation:** Segments objects in the image using an encoder-decoder architecture. 2. **Dense Feature Extraction:** Extracts color and geometric features separately using CNNs and PointNet-like architectures. 3. **Pixel-wise Dense Fusion:** Fuses color and geometric features at the pixel level to create dense feature embeddings. 4. **Pose Estimation:** Uses a final network to predict 6D poses based on the fused features. 5. **Iterative Refinement:** Refines pose estimates iteratively using a neural network-based approach. **Experiments:** - **YCB-Video Dataset:** Evaluates performance on objects with varying shapes and textures under different occlusion conditions. - **LineMOD Dataset:** Compares with state-of-the-art methods and demonstrates robustness to occlusions. - **Robotic Grasping:** Evaluates the accuracy of estimated poses for real-world robotic tasks. **Results:** - DenseFusion outperforms state-of-the-art methods in both datasets. - The iterative refinement module significantly improves pose estimation accuracy. - The method is significantly faster than existing methods, making it suitable for real-time applications. **Conclusion:** DenseFusion provides a robust and efficient solution for 6D object pose estimation from RGB-D images, demonstrating superior performance in various challenging

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

15 Jan 2019 | Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese