HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

17 Jun 2024 | Di Wang*, Meiqi Hu*, Yao Jin*, Yuchun Miao*, Jiaqi Yang*, Yichu Xu*, Xiaolei Qin*, Jiaqi Ma*, Lingyu Sun*, Chenxing Li*, Chuan Fu, Hongruixuan Chen, Chengxi Han†, Naoto Yokoya, Member, IEEE, Jing Zhang†, Senior Member, IEEE, Minqiang Xu, Lin Liu, Lefei Zhang, Senior Member, IEEE, Chen Wu†, Member, IEEE, Bo Du†, Senior Member, IEEE, Dacheng Tao, Fellow, IEEE and Liangpei Zhang†, Fellow, IEEE
HyperSIGMA is a vision transformer-based foundation model designed for hyperspectral image (HSI) interpretation, capable of scaling to over a billion parameters. It addresses the challenges of spectral and spatial redundancy in HSIs by introducing a novel sparse sampling attention (SSA) mechanism, which enhances the learning of diverse contextual features. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. A large-scale hyperspectral dataset, HyperGlobal-450K, containing about 450K hyperspectral images, is used for pre-training. Extensive experiments on various HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. HyperSIGMA also shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. The code and models will be released at HyperSIGMA. The paper presents HyperSIGMA as the first foundation model specifically designed for HSI interpretation, offering a unified solution to both high-level and low-level tasks. The main contributions include the construction of HyperGlobal-450K, the development of HyperSIGMA, the proposal of SSA, and extensive experiments demonstrating HyperSIGMA's performance. The paper also discusses related work in HSI processing, remote sensing foundation models, large-scale remote sensing datasets, and self-attention mechanisms. The methodology involves three main steps: initializing model weights through pre-training, enhancing model structure with SSA, and fusing spatial-spectral features. The model is evaluated on various HSI tasks, including image classification, target detection, anomaly detection, and change detection, as well as low-level tasks such as spectral unmixing, image denoising, and image superresolution. The results show that HyperSIGMA outperforms existing methods in terms of accuracy, robustness, and scalability.HyperSIGMA is a vision transformer-based foundation model designed for hyperspectral image (HSI) interpretation, capable of scaling to over a billion parameters. It addresses the challenges of spectral and spatial redundancy in HSIs by introducing a novel sparse sampling attention (SSA) mechanism, which enhances the learning of diverse contextual features. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. A large-scale hyperspectral dataset, HyperGlobal-450K, containing about 450K hyperspectral images, is used for pre-training. Extensive experiments on various HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. HyperSIGMA also shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. The code and models will be released at HyperSIGMA. The paper presents HyperSIGMA as the first foundation model specifically designed for HSI interpretation, offering a unified solution to both high-level and low-level tasks. The main contributions include the construction of HyperGlobal-450K, the development of HyperSIGMA, the proposal of SSA, and extensive experiments demonstrating HyperSIGMA's performance. The paper also discusses related work in HSI processing, remote sensing foundation models, large-scale remote sensing datasets, and self-attention mechanisms. The methodology involves three main steps: initializing model weights through pre-training, enhancing model structure with SSA, and fusing spatial-spectral features. The model is evaluated on various HSI tasks, including image classification, target detection, anomaly detection, and change detection, as well as low-level tasks such as spectral unmixing, image denoising, and image superresolution. The results show that HyperSIGMA outperforms existing methods in terms of accuracy, robustness, and scalability.
Reach us at info@study.space
Understanding HyperSIGMA%3A Hyperspectral Intelligence Comprehension Foundation Model