28 Mar 2024 | Yue Gao, Senior Member, IEEE, Jiaxuan Lu, Siqi Li, Yipeng Li, Shaoyi Du, Member, IEEE
The paper introduces HyperMV, a novel framework for multi-view event-based action recognition using event cameras. Event cameras, inspired by biological vision sensors, offer high temporal resolution and low power consumption, making them suitable for capturing actions from multiple viewpoints. However, existing methods often face challenges such as information deficit and semantic misalignment in multi-view event data. To address these issues, HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. It then constructs a multi-view hypergraph neural network that captures relationships across viewpoints and temporal features, leveraging rule-based and KNN-based strategies. The framework also includes a vertex attention mechanism for enhanced feature fusion. The authors also introduce THU^MV-EACT_50, the largest multi-view event-based action dataset, comprising 50 actions from 6 viewpoints. Experimental results demonstrate that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, surpassing state-of-the-art frame-based multi-view action recognition methods. The paper contributes to the field by providing a comprehensive benchmark and a robust framework for multi-view event-based action recognition.The paper introduces HyperMV, a novel framework for multi-view event-based action recognition using event cameras. Event cameras, inspired by biological vision sensors, offer high temporal resolution and low power consumption, making them suitable for capturing actions from multiple viewpoints. However, existing methods often face challenges such as information deficit and semantic misalignment in multi-view event data. To address these issues, HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. It then constructs a multi-view hypergraph neural network that captures relationships across viewpoints and temporal features, leveraging rule-based and KNN-based strategies. The framework also includes a vertex attention mechanism for enhanced feature fusion. The authors also introduce THU^MV-EACT_50, the largest multi-view event-based action dataset, comprising 50 actions from 6 viewpoints. Experimental results demonstrate that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, surpassing state-of-the-art frame-based multi-view action recognition methods. The paper contributes to the field by providing a comprehensive benchmark and a robust framework for multi-view event-based action recognition.