Understanding HyperSIGMA%3A Hyperspectral Intelligence Comprehension Foundation Model

HyperSIGMA is a vision transformer-based foundation model designed for hyperspectral image (HSI) interpretation, capable of scaling to over a billion parameters. It addresses the challenges of spectral and spatial redundancy in HSIs by introducing a novel sparse sampling attention (SSA) mechanism, which enhances the learning of diverse contextual features. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. A large-scale hyperspectral dataset, HyperGlobal-450K, containing about 450K hyperspectral images, is used for pre-training. Extensive experiments on various HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. HyperSIGMA also shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. The code and models will be released at HyperSIGMA. The paper presents HyperSIGMA as the first foundation model specifically designed for HSI interpretation, offering a unified solution to both high-level and low-level tasks. The main contributions include the construction of HyperGlobal-450K, the development of HyperSIGMA, the proposal of SSA, and extensive experiments demonstrating HyperSIGMA's performance. The paper also discusses related work in HSI processing, remote sensing foundation models, large-scale remote sensing datasets, and self-attention mechanisms. The methodology involves three main steps: initializing model weights through pre-training, enhancing model structure with SSA, and fusing spatial-spectral features. The model is evaluated on various HSI tasks, including image classification, target detection, anomaly detection, and change detection, as well as low-level tasks such as spectral unmixing, image denoising, and image superresolution. The results show that HyperSIGMA outperforms existing methods in terms of accuracy, robustness, and scalability.HyperSIGMA is a vision transformer-based foundation model designed for hyperspectral image (HSI) interpretation, capable of scaling to over a billion parameters. It addresses the challenges of spectral and spatial redundancy in HSIs by introducing a novel sparse sampling attention (SSA) mechanism, which enhances the learning of diverse contextual features. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. A large-scale hyperspectral dataset, HyperGlobal-450K, containing about 450K hyperspectral images, is used for pre-training. Extensive experiments on various HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. HyperSIGMA also shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. The code and models will be released at HyperSIGMA. The paper presents HyperSIGMA as the first foundation model specifically designed for HSI interpretation, offering a unified solution to both high-level and low-level tasks. The main contributions include the construction of HyperGlobal-450K, the development of HyperSIGMA, the proposal of SSA, and extensive experiments demonstrating HyperSIGMA's performance. The paper also discusses related work in HSI processing, remote sensing foundation models, large-scale remote sensing datasets, and self-attention mechanisms. The methodology involves three main steps: initializing model weights through pre-training, enhancing model structure with SSA, and fusing spatial-spectral features. The model is evaluated on various HSI tasks, including image classification, target detection, anomaly detection, and change detection, as well as low-level tasks such as spectral unmixing, image denoising, and image superresolution. The results show that HyperSIGMA outperforms existing methods in terms of accuracy, robustness, and scalability.

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model