SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

28 Mar 2024 | Xinyu Shi¹,², Zecheng Hao², Zhaofei Yu¹,²*
SpikingResformer introduces a novel spiking self-attention mechanism, Dual Spike Self-Attention (DSSA), to bridge ResNet and Vision Transformer in Spiking Neural Networks (SNNs). DSSA uses Dual Spike Transformation to generate spiking self-attention without requiring float-point operations, enabling compatibility with SNNs. The mechanism includes a scaling factor to handle feature maps of arbitrary scales. Based on DSSA, SpikingResformer combines ResNet-based multi-stage architecture with spiking self-attention to achieve improved performance and energy efficiency with fewer parameters. Experimental results show that SpikingResformer outperforms existing spiking Vision Transformers in accuracy, parameter count, and energy consumption. SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, setting a new state-of-the-art result in the SNN field. The architecture also demonstrates strong transfer learning capabilities on both static and neuromorphic datasets. The proposed method addresses the limitations of existing spiking self-attention mechanisms by introducing a scalable and spike-driven approach, enabling efficient and effective processing in SNNs.SpikingResformer introduces a novel spiking self-attention mechanism, Dual Spike Self-Attention (DSSA), to bridge ResNet and Vision Transformer in Spiking Neural Networks (SNNs). DSSA uses Dual Spike Transformation to generate spiking self-attention without requiring float-point operations, enabling compatibility with SNNs. The mechanism includes a scaling factor to handle feature maps of arbitrary scales. Based on DSSA, SpikingResformer combines ResNet-based multi-stage architecture with spiking self-attention to achieve improved performance and energy efficiency with fewer parameters. Experimental results show that SpikingResformer outperforms existing spiking Vision Transformers in accuracy, parameter count, and energy consumption. SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, setting a new state-of-the-art result in the SNN field. The architecture also demonstrates strong transfer learning capabilities on both static and neuromorphic datasets. The proposed method addresses the limitations of existing spiking self-attention mechanisms by introducing a scalable and spike-driven approach, enabling efficient and effective processing in SNNs.
Reach us at info@study.space