6 Apr 2017 | Iro Armeni1*, Alexander Sax1*, Amir R. Zamir1,2 Silvio Savarese1
The paper introduces a comprehensive dataset for indoor scene understanding, encompassing large-scale indoor spaces with a variety of mutually registered modalities. The dataset includes RGB images, depth, surface normals, global XYZ images, instance-level semantic annotations in 2D and 3D, and registered raw and semantically annotated 3D meshes and point clouds. Covering over 6,000 m² and containing more than 70,000 RGB images, the dataset is designed to support the development of joint and cross-modal learning models, as well as unsupervised approaches. The dataset is collected from six large-scale indoor areas in educational and office buildings, with all modalities registered in the same reference system. It offers a rich source of geometric and semantic information, complementing the dense appearance features provided by RGB images. The paper also details the collection and processing methods, including the use of the Matterport Camera for 3D reconstruction and the generation of additional data through sampling and semantic annotation. Baseline results on 3D object detection are presented, demonstrating the dataset's potential for advancing research in this field.The paper introduces a comprehensive dataset for indoor scene understanding, encompassing large-scale indoor spaces with a variety of mutually registered modalities. The dataset includes RGB images, depth, surface normals, global XYZ images, instance-level semantic annotations in 2D and 3D, and registered raw and semantically annotated 3D meshes and point clouds. Covering over 6,000 m² and containing more than 70,000 RGB images, the dataset is designed to support the development of joint and cross-modal learning models, as well as unsupervised approaches. The dataset is collected from six large-scale indoor areas in educational and office buildings, with all modalities registered in the same reference system. It offers a rich source of geometric and semantic information, complementing the dense appearance features provided by RGB images. The paper also details the collection and processing methods, including the use of the Matterport Camera for 3D reconstruction and the generation of additional data through sampling and semantic annotation. Baseline results on 3D object detection are presented, demonstrating the dataset's potential for advancing research in this field.