| Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox
This paper introduces a large-scale, hierarchical multi-view RGB-D object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories, with 250,000 RGB-D images and video sequences. The objects are captured from multiple view angles, providing a comprehensive dataset for object recognition and detection. The paper describes the dataset collection procedure, including the use of a prototype RGB-D camera and a Point Grey Research Grasshopper camera for high-resolution RGB images. The dataset is publicly available to the research community.
The paper also presents techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information significantly improves the quality of results. The techniques are evaluated at two levels: category-level recognition and instance-level recognition. Category-level recognition involves classifying unseen objects into known categories, while instance-level recognition focuses on identifying physically identical objects.
The dataset includes 8 video sequences of natural scenes, which are annotated with ground truth object pose angles. The authors propose an alternative approach to traditional video frame-by-frame annotation, using 3D scene reconstruction and labeling tools to label objects in the 3D reconstruction and then projecting these labels back into the video frames.
In the object recognition experiments, the authors evaluate the performance of various classifiers using shape features, visual features, and both shape and visual features. The results show that visual features are more useful for category-level recognition, while shape features are more effective for instance-level recognition. Combining both features generally improves overall performance.
For object detection, the authors use a sliding window approach and evaluate the performance of different feature combinations. The results show that combining RGB and depth features significantly improves precision across all recall levels.
The paper concludes by discussing the benefits of the RGB-D Object Dataset for object recognition and detection tasks and provides access to the dataset and associated tools.This paper introduces a large-scale, hierarchical multi-view RGB-D object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories, with 250,000 RGB-D images and video sequences. The objects are captured from multiple view angles, providing a comprehensive dataset for object recognition and detection. The paper describes the dataset collection procedure, including the use of a prototype RGB-D camera and a Point Grey Research Grasshopper camera for high-resolution RGB images. The dataset is publicly available to the research community.
The paper also presents techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information significantly improves the quality of results. The techniques are evaluated at two levels: category-level recognition and instance-level recognition. Category-level recognition involves classifying unseen objects into known categories, while instance-level recognition focuses on identifying physically identical objects.
The dataset includes 8 video sequences of natural scenes, which are annotated with ground truth object pose angles. The authors propose an alternative approach to traditional video frame-by-frame annotation, using 3D scene reconstruction and labeling tools to label objects in the 3D reconstruction and then projecting these labels back into the video frames.
In the object recognition experiments, the authors evaluate the performance of various classifiers using shape features, visual features, and both shape and visual features. The results show that visual features are more useful for category-level recognition, while shape features are more effective for instance-level recognition. Combining both features generally improves overall performance.
For object detection, the authors use a sliding window approach and evaluate the performance of different feature combinations. The results show that combining RGB and depth features significantly improves precision across all recall levels.
The paper concludes by discussing the benefits of the RGB-D Object Dataset for object recognition and detection tasks and provides access to the dataset and associated tools.