Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer

Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer

February 12, 2024 | Mingxuan Liu, Jiankai Tang, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Yuntao Wang, Hong Chen
The paper introduces Spiking-PhysFormer, a hybrid neural network (HNN) that integrates spiking neural networks (SNNs) with transformer architecture for camera-based remote photoplethysmography (rPPG). The goal is to reduce power consumption while maintaining or improving performance. Spiking-PhysFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. Key contributions include the parallel spike-driven transformer, which combines temporal difference convolution (TDC) with spike-driven self-attention (SDSA), and simplified spiking self-attention (S3A), which omits the value parameter. Experiments on four datasets (PURE, UBFC-rPPG, UBFC-Phys, and MMPD) show that Spiking-PhysFormer reduces power consumption by 12.4% compared to PhysFormer, with the transformer block requiring 12.2 times less computational energy while maintaining performance equivalent to other ANN-based models. The model's spatio-temporal attention map highlights its ability to capture facial regions and identify pulse wave peaks, demonstrating interpretability and effectiveness.The paper introduces Spiking-PhysFormer, a hybrid neural network (HNN) that integrates spiking neural networks (SNNs) with transformer architecture for camera-based remote photoplethysmography (rPPG). The goal is to reduce power consumption while maintaining or improving performance. Spiking-PhysFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. Key contributions include the parallel spike-driven transformer, which combines temporal difference convolution (TDC) with spike-driven self-attention (SDSA), and simplified spiking self-attention (S3A), which omits the value parameter. Experiments on four datasets (PURE, UBFC-rPPG, UBFC-Phys, and MMPD) show that Spiking-PhysFormer reduces power consumption by 12.4% compared to PhysFormer, with the transformer block requiring 12.2 times less computational energy while maintaining performance equivalent to other ANN-based models. The model's spatio-temporal attention map highlights its ability to capture facial regions and identify pulse wave peaks, demonstrating interpretability and effectiveness.
Reach us at info@study.space