Indoor Segmentation and Support Inference from RGBD Images

Indoor Segmentation and Support Inference from RGBD Images

2012 | Nathan Silberman¹, Derek Hoiem², Pushmeet Kohli³, and Rob Fergus¹
This paper presents a method for interpreting indoor scenes from RGBD images, focusing on segmenting surfaces, objects, and their support relationships. The approach addresses the challenges of parsing messy, cluttered indoor environments, which are common in real-world settings. The method combines 3D cues with appearance-based cues to infer physical support relations and segment objects. A novel integer programming formulation is introduced to infer support relations, and a new dataset of 1449 RGBD images capturing 464 diverse indoor scenes is provided. The dataset includes detailed annotations for object labels and physical relations. The method first infers the 3D structure of the scene and then jointly parses the image into objects and estimates their support relations. It uses depth cues to overcome the limitations of single-view image-based approaches, enabling more accurate geometric structure. The approach classifies objects into structural classes such as "ground," "permanent structures," "furniture," and "props," which aid in segmentation and support estimation. To infer support relations, the method integrates physical constraints and statistical priors on support relationships. It handles complex scenes with heavy occlusion and invisible supporting surfaces by inferring the location of invisible elements and their interactions with visible components. The method is tested on a new dataset and shows improved performance in inferring support relations and object segmentation compared to existing methods. The paper also evaluates the method on segmentation and support inference tasks. It demonstrates that the method outperforms baselines in terms of accuracy and handles occlusion and complex support relationships. The results show that the method can reliably infer the supporting region and the type of support, especially when segmentations are accurate. The method also shows that initial estimates of support and major surfaces lead to better segmentation. The paper concludes that the approach provides a robust framework for parsing complex indoor scenes using appearance cues, 3D cues, surface fitting, and scene priors.This paper presents a method for interpreting indoor scenes from RGBD images, focusing on segmenting surfaces, objects, and their support relationships. The approach addresses the challenges of parsing messy, cluttered indoor environments, which are common in real-world settings. The method combines 3D cues with appearance-based cues to infer physical support relations and segment objects. A novel integer programming formulation is introduced to infer support relations, and a new dataset of 1449 RGBD images capturing 464 diverse indoor scenes is provided. The dataset includes detailed annotations for object labels and physical relations. The method first infers the 3D structure of the scene and then jointly parses the image into objects and estimates their support relations. It uses depth cues to overcome the limitations of single-view image-based approaches, enabling more accurate geometric structure. The approach classifies objects into structural classes such as "ground," "permanent structures," "furniture," and "props," which aid in segmentation and support estimation. To infer support relations, the method integrates physical constraints and statistical priors on support relationships. It handles complex scenes with heavy occlusion and invisible supporting surfaces by inferring the location of invisible elements and their interactions with visible components. The method is tested on a new dataset and shows improved performance in inferring support relations and object segmentation compared to existing methods. The paper also evaluates the method on segmentation and support inference tasks. It demonstrates that the method outperforms baselines in terms of accuracy and handles occlusion and complex support relationships. The results show that the method can reliably infer the supporting region and the type of support, especially when segmentations are accurate. The method also shows that initial estimates of support and major surfaces lead to better segmentation. The paper concludes that the approach provides a robust framework for parsing complex indoor scenes using appearance cues, 3D cues, surface fitting, and scene priors.
Reach us at info@study.space