2 Jan 2024 | Yongqi Ding, Lin Zuo*, Mengmeng Jing, Pei He, Yongjun Xiao
This paper addresses the challenge of low-latency neuromorphic object recognition using spiking neural networks (SNNs). Existing SNNs suffer from significant latency, often requiring 10 to 40 timesteps or more to recognize objects, which degrades performance at low latencies. To overcome this, the authors propose the Shrinking SNN (SSNN), which reduces temporal redundancy by dividing the SNN into multiple stages with progressively shrinking timesteps, significantly reducing inference latency. A temporal transformer is employed to smoothly transform the temporal scale and preserve information during timestep shrinkage. Additionally, multiple early classifiers are added during training to mitigate the mismatch between surrogate and true gradients and to prevent gradient vanishing/exploding, thereby improving performance at low latencies. Extensive experiments on neuromorphic datasets (CIFAR10-DVS, N-Caltech101, and DVS-Gesture) demonstrate that SSNN can improve baseline accuracy by 6.55% to 21.41% at very low latencies, achieving an accuracy of 73.63% on CIFAR10-DVS with only 5 average timesteps. The work provides valuable insights into developing high-performance, low-latency SNNs with heterogeneous temporal scales.This paper addresses the challenge of low-latency neuromorphic object recognition using spiking neural networks (SNNs). Existing SNNs suffer from significant latency, often requiring 10 to 40 timesteps or more to recognize objects, which degrades performance at low latencies. To overcome this, the authors propose the Shrinking SNN (SSNN), which reduces temporal redundancy by dividing the SNN into multiple stages with progressively shrinking timesteps, significantly reducing inference latency. A temporal transformer is employed to smoothly transform the temporal scale and preserve information during timestep shrinkage. Additionally, multiple early classifiers are added during training to mitigate the mismatch between surrogate and true gradients and to prevent gradient vanishing/exploding, thereby improving performance at low latencies. Extensive experiments on neuromorphic datasets (CIFAR10-DVS, N-Caltech101, and DVS-Gesture) demonstrate that SSNN can improve baseline accuracy by 6.55% to 21.41% at very low latencies, achieving an accuracy of 73.63% on CIFAR10-DVS with only 5 average timesteps. The work provides valuable insights into developing high-performance, low-latency SNNs with heterogeneous temporal scales.