23 Jun 2024 | Yuxuan Li, Xiang Li, Yimian Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang
The paper introduces LSKNet, a lightweight backbone network designed to address the challenges of remote sensing images, which are characterized by high resolution, random orientation, large intraclass variation, multiscale scenes, and dense small objects. LSKNet leverages the prior knowledge that accurate recognition often requires a wide range of contextual information and that the contextual information required for different objects varies significantly. The network employs a dynamic modulation of the receptive field within the feature extraction backbone, allowing it to efficiently model diverse and wide-ranging contexts. This is achieved through a spatial selective mechanism that weights features processed by a sequence of large depth-wise kernels and then spatially merges them. The weights of these kernels are determined dynamically based on the input, enabling the model to use different large kernels adaptively and adjust the receptive field for each object as needed.
LSKNet is evaluated on various remote sensing tasks, including scene classification, object detection, semantic segmentation, and change detection, on 14 widely used public datasets. The results show that LSKNet achieves state-of-the-art performance without the need for complex feature ensembles or large models. The paper also includes a comprehensive analysis to validate the effectiveness and significance of the proposed model, highlighting the importance of the identified priors in remote sensing images. The code for LSKNet is available on GitHub.The paper introduces LSKNet, a lightweight backbone network designed to address the challenges of remote sensing images, which are characterized by high resolution, random orientation, large intraclass variation, multiscale scenes, and dense small objects. LSKNet leverages the prior knowledge that accurate recognition often requires a wide range of contextual information and that the contextual information required for different objects varies significantly. The network employs a dynamic modulation of the receptive field within the feature extraction backbone, allowing it to efficiently model diverse and wide-ranging contexts. This is achieved through a spatial selective mechanism that weights features processed by a sequence of large depth-wise kernels and then spatially merges them. The weights of these kernels are determined dynamically based on the input, enabling the model to use different large kernels adaptively and adjust the receptive field for each object as needed.
LSKNet is evaluated on various remote sensing tasks, including scene classification, object detection, semantic segmentation, and change detection, on 14 widely used public datasets. The results show that LSKNet achieves state-of-the-art performance without the need for complex feature ensembles or large models. The paper also includes a comprehensive analysis to validate the effectiveness and significance of the proposed model, highlighting the importance of the identified priors in remote sensing images. The code for LSKNet is available on GitHub.