Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection

Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection

3 Jun 2024 | Kunpeng Wang, Zhengzheng Tu, Chenglong Li, Cheng Zhang, Bin Luo
The paper introduces a novel approach called Learning Adaptive Fusion Bank (LAFB) for Multi-modal Salient Object Detection (MSOD). LAFB aims to enhance the performance of MSOD by integrating visible, depth, and thermal infrared sources. The proposed method addresses five major challenges in MSOD: center bias, scale variation, image clutter, low illumination, and thermal crossover or depth ambiguity. To tackle these challenges, LAFB consists of five representative fusion schemes, each designed to handle specific issues. These schemes are embedded into a hierarchical encoder-decoder framework, where an adaptive ensemble module selects the appropriate fusion scheme based on the input data. Additionally, an indirect interactive guidance module is introduced to integrate high-level semantic information and low-level detailed features, improving the accuracy of detecting salient hollow objects. Extensive experiments on RGBD and RGBT datasets demonstrate that LAFB outperforms state-of-the-art methods in handling multiple complex challenges simultaneously. The code and results are available at <https://github.com/Angknpng/LAFB>.The paper introduces a novel approach called Learning Adaptive Fusion Bank (LAFB) for Multi-modal Salient Object Detection (MSOD). LAFB aims to enhance the performance of MSOD by integrating visible, depth, and thermal infrared sources. The proposed method addresses five major challenges in MSOD: center bias, scale variation, image clutter, low illumination, and thermal crossover or depth ambiguity. To tackle these challenges, LAFB consists of five representative fusion schemes, each designed to handle specific issues. These schemes are embedded into a hierarchical encoder-decoder framework, where an adaptive ensemble module selects the appropriate fusion scheme based on the input data. Additionally, an indirect interactive guidance module is introduced to integrate high-level semantic information and low-level detailed features, improving the accuracy of detecting salient hollow objects. Extensive experiments on RGBD and RGBT datasets demonstrate that LAFB outperforms state-of-the-art methods in handling multiple complex challenges simultaneously. The code and results are available at <https://github.com/Angknpng/LAFB>.
Reach us at info@study.space