The paper introduces IRSAM, a novel approach for Infrared Small Target Detection (IRSTD) that leverages the Segment Anything Model (SAM) and enhances its encoder-decoder architecture to improve performance on IRSTD tasks. The main challenges in IRSTD include the low signal-to-noise ratio (SNR) and blurred target edges due to the nature of infrared imaging. To address these issues, the authors propose two key contributions: a Wavelet-based Perona-Malik Diffusion (WPMD) module and a Granularity-Aware Decoder (GAD).
1. **WPMD Module**: This module is designed to enhance the encoder's ability to preserve edge-related features while suppressing noise in infrared images. It uses wavelet transform to substitute the gradient term in the Perona-Malik diffusion equation, effectively preserving structural information and reducing noise.
2. **GAD**: This module fuses multi-granularity features from the encoder to improve the mask representation of objects in various sizes and shapes. It integrates global semantic context and local fine-grained features to enhance the precision of segmentation boundaries.
The authors evaluate IRSAM on three public datasets (NUAA-SIRST, IRSTD-1k, and NUDT-SIRST) and demonstrate its superior performance compared to state-of-the-art methods in both objective metrics (IoU, nIoU) and subjective evaluations. The results show that IRSAM effectively extracts structural information, handles complex target shapes, and segments multiple adjacent objects accurately. Ablation studies further validate the effectiveness of the WPMD and GAD modules, confirming their contributions to the overall performance of IRSAM.The paper introduces IRSAM, a novel approach for Infrared Small Target Detection (IRSTD) that leverages the Segment Anything Model (SAM) and enhances its encoder-decoder architecture to improve performance on IRSTD tasks. The main challenges in IRSTD include the low signal-to-noise ratio (SNR) and blurred target edges due to the nature of infrared imaging. To address these issues, the authors propose two key contributions: a Wavelet-based Perona-Malik Diffusion (WPMD) module and a Granularity-Aware Decoder (GAD).
1. **WPMD Module**: This module is designed to enhance the encoder's ability to preserve edge-related features while suppressing noise in infrared images. It uses wavelet transform to substitute the gradient term in the Perona-Malik diffusion equation, effectively preserving structural information and reducing noise.
2. **GAD**: This module fuses multi-granularity features from the encoder to improve the mask representation of objects in various sizes and shapes. It integrates global semantic context and local fine-grained features to enhance the precision of segmentation boundaries.
The authors evaluate IRSAM on three public datasets (NUAA-SIRST, IRSTD-1k, and NUDT-SIRST) and demonstrate its superior performance compared to state-of-the-art methods in both objective metrics (IoU, nIoU) and subjective evaluations. The results show that IRSAM effectively extracts structural information, handles complex target shapes, and segments multiple adjacent objects accurately. Ablation studies further validate the effectiveness of the WPMD and GAD modules, confirming their contributions to the overall performance of IRSAM.