Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

2024 | Zhang, Q., Qin, Q., Yang, Y. et al.
This paper proposes a two-stage RGB-D salient object detection model, which addresses the challenges of low-quality and foreground-inconsistent depth images in RGB-D saliency detection. The model consists of an image generation stage and a saliency reasoning stage. In the image generation stage, a Two-steps Sample Selection (TSS) strategy is employed to select high-quality and foreground-consistent depth images from the original RGB-D image pairs as supervision information for the image generation network. In the saliency reasoning stage, a Feature Calibrating and Fusing Network (FCFNet) is proposed to calibrate the original depth information with the aid of generated pseudo depth images and then perform cross-modal feature fusion for the final saliency prediction. The FCFNet is composed of three modules: a Depth Feature Calibration (DFC) module, a Multi-modal Multi-scale Fusion (MMF) module, and a Shallow-level Feature Injection (SFI) module. Additionally, a Region Consistency Aware (RCA) loss function is introduced as an auxiliary loss to enhance the local regional consistency within the foreground and background regions of the saliency maps. The proposed model is evaluated on six benchmark datasets, demonstrating superior performance compared to existing state-of-the-art methods. The results show that the proposed model achieves better accuracy and robustness in salient object detection, particularly in challenging scenarios with low-quality or foreground-inconsistent depth images.This paper proposes a two-stage RGB-D salient object detection model, which addresses the challenges of low-quality and foreground-inconsistent depth images in RGB-D saliency detection. The model consists of an image generation stage and a saliency reasoning stage. In the image generation stage, a Two-steps Sample Selection (TSS) strategy is employed to select high-quality and foreground-consistent depth images from the original RGB-D image pairs as supervision information for the image generation network. In the saliency reasoning stage, a Feature Calibrating and Fusing Network (FCFNet) is proposed to calibrate the original depth information with the aid of generated pseudo depth images and then perform cross-modal feature fusion for the final saliency prediction. The FCFNet is composed of three modules: a Depth Feature Calibration (DFC) module, a Multi-modal Multi-scale Fusion (MMF) module, and a Shallow-level Feature Injection (SFI) module. Additionally, a Region Consistency Aware (RCA) loss function is introduced as an auxiliary loss to enhance the local regional consistency within the foreground and background regions of the saliency maps. The proposed model is evaluated on six benchmark datasets, demonstrating superior performance compared to existing state-of-the-art methods. The results show that the proposed model achieves better accuracy and robustness in salient object detection, particularly in challenging scenarios with low-quality or foreground-inconsistent depth images.
Reach us at info@study.space