QKFormer: Hierarchical Spiking Transformer using Q-K Attention

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

25 Mar 2024 | Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian
QKFormer is a novel hierarchical spiking transformer designed to improve the performance of spiking neural networks (SNNs) by integrating them with Transformer architectures. The key contributions of QKFormer include: 1. **Q-K Attention Mechanism**: A novel spike-form Q-K attention mechanism tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. This mechanism uses only two spike-form components: Query ($Q$) and Key ($K$). 2. **Hierarchical Structure**: The incorporation of a hierarchical structure into spiking transformers to obtain multi-scale spiking representations, benefiting both brain and artificial neural networks. 3. **Patch Embedding Module**: A versatile and powerful patch embedding module with a deformed shortcut specifically designed for spiking transformers, enhancing spiking information transmission and improving performance. QKFormer is trained directly, achieving superior performance over existing state-of-the-art SNN models on various datasets. Notably, QKFormer achieves a top-1 accuracy of 85.65% on ImageNet-1k with 4 time steps, surpassing the previous best model by 10.84%. The code and models are publicly available at <https://github.com/zhouchenlin2096/QKFormer>.QKFormer is a novel hierarchical spiking transformer designed to improve the performance of spiking neural networks (SNNs) by integrating them with Transformer architectures. The key contributions of QKFormer include: 1. **Q-K Attention Mechanism**: A novel spike-form Q-K attention mechanism tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. This mechanism uses only two spike-form components: Query ($Q$) and Key ($K$). 2. **Hierarchical Structure**: The incorporation of a hierarchical structure into spiking transformers to obtain multi-scale spiking representations, benefiting both brain and artificial neural networks. 3. **Patch Embedding Module**: A versatile and powerful patch embedding module with a deformed shortcut specifically designed for spiking transformers, enhancing spiking information transmission and improving performance. QKFormer is trained directly, achieving superior performance over existing state-of-the-art SNN models on various datasets. Notably, QKFormer achieves a top-1 accuracy of 85.65% on ImageNet-1k with 4 time steps, surpassing the previous best model by 10.84%. The code and models are publicly available at <https://github.com/zhouchenlin2096/QKFormer>.
Reach us at info@study.space
[slides] QKFormer%3A Hierarchical Spiking Transformer using Q-K Attention | StudySpace