Understanding Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

The paper presents a novel RGB-D salient object detection model that addresses the issue of low-quality and foreground-inconsistent depth images, which can degrade the performance of RGB-D salient object detection. The proposed model follows a calibration-then-fusion principle, consisting of two stages: an image generation stage and a saliency reasoning stage. In the image generation stage, a Two-steps Sample Selection (TSS) strategy is employed to select high-quality and foreground-consistent depth images from the original RGB-D image pairs as supervision information for generating pseudo depth images. In the saliency reasoning stage, a Feature Calibrating and Fusing Network (FCFNet) is introduced to calibrate the original depth information using the generated pseudo depth images and then perform cross-modal feature fusion for final saliency prediction. FCFNet includes three modules: a Depth Feature Calibration (DFC) module, a Shallow-level Feature Injection (SFI) module, and a Multi-modal Multi-scale Fusion (MMF) module. Additionally, a Region Consistency Aware (RCA) loss function is proposed to facilitate the completeness of salient objects and reduce background interference. Experiments on six benchmark datasets demonstrate the superior performance of the proposed model compared to state-of-the-art methods.The paper presents a novel RGB-D salient object detection model that addresses the issue of low-quality and foreground-inconsistent depth images, which can degrade the performance of RGB-D salient object detection. The proposed model follows a calibration-then-fusion principle, consisting of two stages: an image generation stage and a saliency reasoning stage. In the image generation stage, a Two-steps Sample Selection (TSS) strategy is employed to select high-quality and foreground-consistent depth images from the original RGB-D image pairs as supervision information for generating pseudo depth images. In the saliency reasoning stage, a Feature Calibrating and Fusing Network (FCFNet) is introduced to calibrate the original depth information using the generated pseudo depth images and then perform cross-modal feature fusion for final saliency prediction. FCFNet includes three modules: a Depth Feature Calibration (DFC) module, a Shallow-level Feature Injection (SFI) module, and a Multi-modal Multi-scale Fusion (MMF) module. Additionally, a Region Consistency Aware (RCA) loss function is proposed to facilitate the completeness of salient objects and reduce background interference. Experiments on six benchmark datasets demonstrate the superior performance of the proposed model compared to state-of-the-art methods.

Feature calibrating and fusing network for RGB-D salient object detection

2024 | Zhang, Q., Qin, Q., Yang, Y. et al.