6 Apr 2017 | Iro Armeni1*, Alexander Sax1*, Amir R. Zamir1,2 Silvio Savarese1
The paper presents a large-scale indoor scene dataset that combines 2D, 2.5D, and 3D modalities, including RGB images, depth, surface normals, global XYZ images, and instance-level semantic annotations in both 2D and 3D. The dataset covers over 6,000 square meters and contains over 70,000 RGB images, along with corresponding depth, surface normals, semantic annotations, and camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables the development of joint and cross-modal learning models and potentially unsupervised approaches by leveraging the regularities present in large-scale indoor spaces.
The dataset includes RGB, depth, equirectangular, and global XYZ OpenEXR images, as well as 3D meshes and point clouds of the same indoor spaces. The different modalities can be used independently or jointly to develop learning models that seamlessly transcend across domains. The dataset also provides consistent annotations across all modalities and dimensions. It includes 70,496 regular RGB and 1,413 equirectangular RGB images, along with their corresponding depths, surface normals, semantic annotations, global XYZ OpenEXR format, and camera metadata. In addition, the dataset provides whole building 3D reconstructions as textured meshes, as well as the corresponding 3D semantic meshes. It also includes colored 3D point cloud data of these areas with a total of 695,878,620 points.
The dataset is collected in 6 large-scale indoor areas from 3 different buildings. For each area, all modalities are registered in the same reference system, yielding pixel to pixel correspondences among them. The dataset includes 3D point cloud data and 3D mesh models, along with their semantic counterparts. The 3D point cloud data is generated by densely and uniformly sampling points on the mesh surface and assigning the corresponding color. The 3D semantics are semantically annotated on the 3D point cloud and assigned one of the following 13 object classes: ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, board, and clutter. The dataset also includes room and scene labels for the point cloud data.
The dataset provides a variety of modalities, including RGB images, depth images, surface normal images, and semantically labeled images. The RGB images are stored in full high-definition at 1080x1080 resolution. The depth images are computed from the 3D mesh instead of directly from the scanner. The surface normal images are computed from a normals pass in Blender and are saved as 24-bit RGB PNGs. The semantically labeled images are saved as 24-bit RGB PNGs, with each pixel's color value directly interpretable as an index intoThe paper presents a large-scale indoor scene dataset that combines 2D, 2.5D, and 3D modalities, including RGB images, depth, surface normals, global XYZ images, and instance-level semantic annotations in both 2D and 3D. The dataset covers over 6,000 square meters and contains over 70,000 RGB images, along with corresponding depth, surface normals, semantic annotations, and camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables the development of joint and cross-modal learning models and potentially unsupervised approaches by leveraging the regularities present in large-scale indoor spaces.
The dataset includes RGB, depth, equirectangular, and global XYZ OpenEXR images, as well as 3D meshes and point clouds of the same indoor spaces. The different modalities can be used independently or jointly to develop learning models that seamlessly transcend across domains. The dataset also provides consistent annotations across all modalities and dimensions. It includes 70,496 regular RGB and 1,413 equirectangular RGB images, along with their corresponding depths, surface normals, semantic annotations, global XYZ OpenEXR format, and camera metadata. In addition, the dataset provides whole building 3D reconstructions as textured meshes, as well as the corresponding 3D semantic meshes. It also includes colored 3D point cloud data of these areas with a total of 695,878,620 points.
The dataset is collected in 6 large-scale indoor areas from 3 different buildings. For each area, all modalities are registered in the same reference system, yielding pixel to pixel correspondences among them. The dataset includes 3D point cloud data and 3D mesh models, along with their semantic counterparts. The 3D point cloud data is generated by densely and uniformly sampling points on the mesh surface and assigning the corresponding color. The 3D semantics are semantically annotated on the 3D point cloud and assigned one of the following 13 object classes: ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, board, and clutter. The dataset also includes room and scene labels for the point cloud data.
The dataset provides a variety of modalities, including RGB images, depth images, surface normal images, and semantically labeled images. The RGB images are stored in full high-definition at 1080x1080 resolution. The depth images are computed from the 3D mesh instead of directly from the scanner. The surface normal images are computed from a normals pass in Blender and are saved as 24-bit RGB PNGs. The semantically labeled images are saved as 24-bit RGB PNGs, with each pixel's color value directly interpretable as an index into