Understanding SpikingResformer%3A Bridging ResNet and Vision Transformer in Spiking Neural Networks

The paper "SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks" introduces a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) and a spiking Vision Transformer architecture called SpikingResformer. DSSA is designed to be fully spike-driven and compatible with Spiking Neural Networks (SNNs), addressing the limitations of existing spiking self-attention mechanisms that lack reasonable scaling methods. SpikingResformer combines the ResNet-based multi-stage architecture with DSSA to improve performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption compared to other spiking Vision Transformers. Notably, SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, setting a new state-of-the-art result in the SNN field. The paper also discusses the scaling factors in DSSA and the spike-driven characteristic of DSSA, providing a detailed analysis of its effectiveness and superiority over existing methods.The paper "SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks" introduces a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) and a spiking Vision Transformer architecture called SpikingResformer. DSSA is designed to be fully spike-driven and compatible with Spiking Neural Networks (SNNs), addressing the limitations of existing spiking self-attention mechanisms that lack reasonable scaling methods. SpikingResformer combines the ResNet-based multi-stage architecture with DSSA to improve performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption compared to other spiking Vision Transformers. Notably, SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, setting a new state-of-the-art result in the SNN field. The paper also discusses the scaling factors in DSSA and the spike-driven characteristic of DSSA, providing a detailed analysis of its effectiveness and superiority over existing methods.

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

28 Mar 2024 | Xinyu Shi1,2, Zecheng Hao2, Zhaofei Yu1,2*