Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

2014 | Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik
This paper presents a method for object detection and segmentation using RGB-D images. The authors propose a geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel, in addition to horizontal disparity. This representation allows convolutional neural networks (CNNs) to learn stronger features than using disparity or depth alone. The system achieves an average precision of 37.3% for object detection, a 56% relative improvement over existing methods. For instance segmentation, the authors propose a decision forest approach that classifies pixels as foreground or background using shape and geocentric pose features. They also use their object detectors in a superpixel classification framework for semantic scene segmentation, achieving a 24% relative improvement over current state-of-the-art methods. The paper also describes the use of RGB-D images for 2.5D region proposals, which are generated using a combination of contour detection and structured learning. The authors show that their method improves contour detection and region proposal quality compared to existing approaches. They also explore the use of synthetic data augmentation to improve performance on the NYUD2 dataset. The results show that their system outperforms existing methods in object detection, instance segmentation, and semantic segmentation tasks. The paper concludes that advances in RGB-D perception, such as those presented, will facilitate the use of perception in fields like robotics.This paper presents a method for object detection and segmentation using RGB-D images. The authors propose a geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel, in addition to horizontal disparity. This representation allows convolutional neural networks (CNNs) to learn stronger features than using disparity or depth alone. The system achieves an average precision of 37.3% for object detection, a 56% relative improvement over existing methods. For instance segmentation, the authors propose a decision forest approach that classifies pixels as foreground or background using shape and geocentric pose features. They also use their object detectors in a superpixel classification framework for semantic scene segmentation, achieving a 24% relative improvement over current state-of-the-art methods. The paper also describes the use of RGB-D images for 2.5D region proposals, which are generated using a combination of contour detection and structured learning. The authors show that their method improves contour detection and region proposal quality compared to existing approaches. They also explore the use of synthetic data augmentation to improve performance on the NYUD2 dataset. The results show that their system outperforms existing methods in object detection, instance segmentation, and semantic segmentation tasks. The paper concludes that advances in RGB-D perception, such as those presented, will facilitate the use of perception in fields like robotics.
Reach us at info@study.space
[slides] Learning Rich Features from RGB-D Images for Object Detection and Segmentation | StudySpace