IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

10 Jul 2024 | Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, and Jing Zhang
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection This paper proposes IRSAM, a novel model for infrared small target detection (IRSTD). IRSAM improves the Segment Anything Model (SAM) by modifying its encoder-decoder architecture to better capture features of infrared small objects. Specifically, a Perona-Malik diffusion (PMD)-based block is designed and integrated into SAM's encoder to suppress noise and preserve structural features. Additionally, a Granularity-Aware Decoder (GAD) is introduced to fuse multi-granularity features from the encoder, enhancing mask representation for various target sizes and shapes. IRSAM outperforms state-of-the-art methods on public datasets such as NUAA-SIRST, NUDT-SIRST, and IRSTD-1K in both objective metrics and subjective evaluation. The model is built upon SAM with a lightweight ViT-Tiny backbone and incorporates the WPMD module and GAD to enhance performance. The contributions of this paper include the first redesign of SAM for IRSTD, the design of WPMD to preserve edge-related features and suppress noise, and the design of GAD to reconstruct lost structural features and enhance mask representation. IRSAM demonstrates superior performance in handling the challenges of IRSTD, including low signal-to-noise ratio and blurred target edges. The model is evaluated on multiple datasets and shows significant improvements in detection accuracy and efficiency. The source code is available at github.com/IPIC-Lab/IRSAM.IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection This paper proposes IRSAM, a novel model for infrared small target detection (IRSTD). IRSAM improves the Segment Anything Model (SAM) by modifying its encoder-decoder architecture to better capture features of infrared small objects. Specifically, a Perona-Malik diffusion (PMD)-based block is designed and integrated into SAM's encoder to suppress noise and preserve structural features. Additionally, a Granularity-Aware Decoder (GAD) is introduced to fuse multi-granularity features from the encoder, enhancing mask representation for various target sizes and shapes. IRSAM outperforms state-of-the-art methods on public datasets such as NUAA-SIRST, NUDT-SIRST, and IRSTD-1K in both objective metrics and subjective evaluation. The model is built upon SAM with a lightweight ViT-Tiny backbone and incorporates the WPMD module and GAD to enhance performance. The contributions of this paper include the first redesign of SAM for IRSTD, the design of WPMD to preserve edge-related features and suppress noise, and the design of GAD to reconstruct lost structural features and enhance mask representation. IRSAM demonstrates superior performance in handling the challenges of IRSTD, including low signal-to-noise ratio and blurred target edges. The model is evaluated on multiple datasets and shows significant improvements in detection accuracy and efficiency. The source code is available at github.com/IPIC-Lab/IRSAM.
Reach us at info@study.space
Understanding IRSAM%3A Advancing Segment Anything Model for Infrared Small Target Detection