Understanding Matterport3D%3A Learning from RGB-D Data in Indoor Environments

Matterport3D is a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. The dataset includes surface reconstructions, camera poses, and 2D and 3D semantic segmentations. It provides precise global alignment and a comprehensive set of panoramic views, enabling various supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification. The dataset was collected using a tripod-mounted camera rig with three color and three depth cameras. For each panorama, the rig rotates around the direction of gravity to 6 distinct orientations, capturing HDR photos from each of the 3 RGB cameras. The 3 depth cameras acquire data continuously as the rig rotates, which is integrated to synthesize a 1280x1024 depth image aligned with each color image. The result for each panorama is 18 RGB-D images with nearly coincident centers of projection at approximately the height of a human observer. The dataset includes 90 buildings containing a total of 194,400 RGB-D images, 10,800 panoramas, and 24,727,520 textured triangles. It provides instance-level semantic annotations, including 50,811 object instance annotations. The dataset also includes comprehensive viewpoint sampling, precise global alignment, and multiple, diverse views of each surface. The dataset enables a variety of computer vision tasks, including keypoint matching, view overlap prediction, surface normal estimation, region-type classification, and semantic voxel labeling. The dataset has been used to train models for keypoint matching, view overlap prediction, surface normal estimation, and region-type classification. The results show that the Matterport3D dataset provides high-quality data for training these tasks, leading to improved performance compared to previous datasets. The dataset is also valuable for studying fine-scale features of imagery in scenes and for learning to predict view-dependent surface properties.Matterport3D is a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. The dataset includes surface reconstructions, camera poses, and 2D and 3D semantic segmentations. It provides precise global alignment and a comprehensive set of panoramic views, enabling various supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification. The dataset was collected using a tripod-mounted camera rig with three color and three depth cameras. For each panorama, the rig rotates around the direction of gravity to 6 distinct orientations, capturing HDR photos from each of the 3 RGB cameras. The 3 depth cameras acquire data continuously as the rig rotates, which is integrated to synthesize a 1280x1024 depth image aligned with each color image. The result for each panorama is 18 RGB-D images with nearly coincident centers of projection at approximately the height of a human observer. The dataset includes 90 buildings containing a total of 194,400 RGB-D images, 10,800 panoramas, and 24,727,520 textured triangles. It provides instance-level semantic annotations, including 50,811 object instance annotations. The dataset also includes comprehensive viewpoint sampling, precise global alignment, and multiple, diverse views of each surface. The dataset enables a variety of computer vision tasks, including keypoint matching, view overlap prediction, surface normal estimation, region-type classification, and semantic voxel labeling. The dataset has been used to train models for keypoint matching, view overlap prediction, surface normal estimation, and region-type classification. The results show that the Matterport3D dataset provides high-quality data for training these tasks, leading to improved performance compared to previous datasets. The dataset is also valuable for studying fine-scale features of imagery in scenes and for learning to predict view-dependent surface properties.

Matterport3D: Learning from RGB-D Data in Indoor Environments

18 Sep 2017 | Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang