QKFormer: Hierarchical Spiking Transformer using Q-K Attention

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

25 Mar 2024 | Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian
QKFormer is a hierarchical spiking transformer that uses Q-K attention to improve the performance of spiking neural networks (SNNs). The model introduces a novel spike-form Q-K attention mechanism, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. It also incorporates a hierarchical structure to obtain multi-scale spiking representation and a versatile patch embedding module with a deformed shortcut. QKFormer achieves significantly superior performance compared to existing state-of-the-art SNN models on various datasets, including ImageNet-1K, where it achieves a top-1 accuracy of 85.65% with a model size of 64.96M, surpassing Spikformer by 10.84%. The model is trained directly without requiring pre-training, and it demonstrates high energy efficiency due to its sparse spike-form operations. QKFormer's hierarchical structure and Q-K attention mechanism enable it to explore large-scale hierarchical SNN models with linear computational complexity. The model outperforms other SNN models on both static and neuromorphic datasets, achieving high accuracy on CIFAR10, CIFAR100, and event-based datasets like CIFAR10-DVS and DVS128 Gesture. The model's performance is further validated through ablation studies and experiments on different time steps and spiking neuron models. QKFormer represents a significant advancement in the field of SNNs, demonstrating the potential of direct training and hierarchical spiking representations in achieving high accuracy and efficiency.QKFormer is a hierarchical spiking transformer that uses Q-K attention to improve the performance of spiking neural networks (SNNs). The model introduces a novel spike-form Q-K attention mechanism, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. It also incorporates a hierarchical structure to obtain multi-scale spiking representation and a versatile patch embedding module with a deformed shortcut. QKFormer achieves significantly superior performance compared to existing state-of-the-art SNN models on various datasets, including ImageNet-1K, where it achieves a top-1 accuracy of 85.65% with a model size of 64.96M, surpassing Spikformer by 10.84%. The model is trained directly without requiring pre-training, and it demonstrates high energy efficiency due to its sparse spike-form operations. QKFormer's hierarchical structure and Q-K attention mechanism enable it to explore large-scale hierarchical SNN models with linear computational complexity. The model outperforms other SNN models on both static and neuromorphic datasets, achieving high accuracy on CIFAR10, CIFAR100, and event-based datasets like CIFAR10-DVS and DVS128 Gesture. The model's performance is further validated through ablation studies and experiments on different time steps and spiking neuron models. QKFormer represents a significant advancement in the field of SNNs, demonstrating the potential of direct training and hierarchical spiking representations in achieving high accuracy and efficiency.
Reach us at info@study.space