Pyramid Hierarchical Transformer for Hyperspectral Image Classification

Pyramid Hierarchical Transformer for Hyperspectral Image Classification

2024 | Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Manuel Mazzara, Salvatore Distifano
The paper introduces PyFormer, a Pyramid Hierarchical Transformer for Hyperspectral Image Classification (HSIC). Traditional Transformers face challenges with variable-length input sequences, leading to inefficiency and scalability issues in HSIC. To address this, PyFormer proposes a pyramid-based hierarchical structure that organizes input data into hierarchical segments, each representing different abstraction levels. This approach enhances processing efficiency by capturing both local and global context through dedicated transformer modules at each level. Spatial and spectral information flow within the hierarchy facilitates communication and abstraction propagation, with outputs from different levels integrated to form the final input representation. Experimental results show that PyFormer outperforms traditional methods, with the incorporation of disjoint samples improving robustness and reliability. PyFormer is designed to handle the challenges of HSIC, including the computational demands of self-attention mechanisms and the need for large training data. The model uses a pyramid structure to partition input data into segments, with each level applying a transformer module for multi-level processing. The hierarchical structure allows for efficient capture of both local and global context, with information flow occurring spatially and spectrally within the hierarchy. The model's architecture includes convolutional layers for extracting spatial-spectral features, followed by transformer modules that process these features through attention mechanisms and feedforward networks. The final output is generated using a softmax activation function. The model is evaluated on three datasets: Pavia University (PU), Salinas (SA), and University of Houston (UH). PyFormer achieves high kappa accuracies of 99.73%, 99.84%, and 98.01% on these datasets, outperforming other state-of-the-art models such as ViT, Spectralformer, HiT, CSiT, and WaveFormer. The results demonstrate that PyFormer is effective in HSIC, particularly in scenarios with limited training data. The model's performance is further validated through extensive experiments, showing its robustness and generalizability. Future research could explore techniques such as self-supervised pre-training and network optimizations to further enhance PyFormer's performance.The paper introduces PyFormer, a Pyramid Hierarchical Transformer for Hyperspectral Image Classification (HSIC). Traditional Transformers face challenges with variable-length input sequences, leading to inefficiency and scalability issues in HSIC. To address this, PyFormer proposes a pyramid-based hierarchical structure that organizes input data into hierarchical segments, each representing different abstraction levels. This approach enhances processing efficiency by capturing both local and global context through dedicated transformer modules at each level. Spatial and spectral information flow within the hierarchy facilitates communication and abstraction propagation, with outputs from different levels integrated to form the final input representation. Experimental results show that PyFormer outperforms traditional methods, with the incorporation of disjoint samples improving robustness and reliability. PyFormer is designed to handle the challenges of HSIC, including the computational demands of self-attention mechanisms and the need for large training data. The model uses a pyramid structure to partition input data into segments, with each level applying a transformer module for multi-level processing. The hierarchical structure allows for efficient capture of both local and global context, with information flow occurring spatially and spectrally within the hierarchy. The model's architecture includes convolutional layers for extracting spatial-spectral features, followed by transformer modules that process these features through attention mechanisms and feedforward networks. The final output is generated using a softmax activation function. The model is evaluated on three datasets: Pavia University (PU), Salinas (SA), and University of Houston (UH). PyFormer achieves high kappa accuracies of 99.73%, 99.84%, and 98.01% on these datasets, outperforming other state-of-the-art models such as ViT, Spectralformer, HiT, CSiT, and WaveFormer. The results demonstrate that PyFormer is effective in HSIC, particularly in scenarios with limited training data. The model's performance is further validated through extensive experiments, showing its robustness and generalizability. Future research could explore techniques such as self-supervised pre-training and network optimizations to further enhance PyFormer's performance.
Reach us at info@study.space