Event-Based Eye Tracking. AIS 2024 Challenge Survey

Event-Based Eye Tracking. AIS 2024 Challenge Survey

17 Apr 2024 | Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baocheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So, Philippe Bich, Chiara Boretti, Luciano Prono, Mircea Lică, David Dinucu-Jianu, Cătălin Griu, Xiaopeng Lin, Hongwei Ren, Bojun Cheng, Xinan Zhang, Valentin Vial, Anthony Yeazzi, James Tsai
This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge, which focuses on processing eye movements recorded with event cameras and predicting the pupil center. The challenge emphasizes efficient eye tracking to achieve a good balance between accuracy and efficiency. During the challenge, 38 participants registered for the Kaggle competition, and 8 teams submitted detailed fact sheets. The survey reviews and analyzes the novel and diverse methods submitted to advance future event-based eye tracking research. The development of augmented reality (AR) and virtual reality (VR) technologies has increased the demand for precise and efficient eye-tracking systems. Eye-tracking and related tasks have significant potential in wearable healthcare technology, offering new approaches for diagnosing and monitoring conditions like Parkinson’s and Alzheimer’s diseases through eye movement patterns. Event cameras, or Dynamic Vision Sensors (DVS), provide a unique sensory modality for eye-tracking applications on mobile devices. Unlike traditional cameras, event cameras asynchronously record intensity changes exceeding a threshold, resulting in sparse spatiotemporal event streams. This sparsity can significantly reduce computation and energy demands, making event cameras suitable for mobile platforms. The challenge aims to explore algorithms for event-based eye tracking using the 3ET+ dataset, which contains real events recorded with a DVXplorer Mini event camera. The dataset includes 13 subjects performing five classes of activities: random, saccades, read text, smooth pursuit, and blinks. The primary evaluation metric is p-accuracy, with a tolerance of 10 pixels. The challenge is divided into three phases: preparation, the Kaggle competition, and submission of factsheets. The survey reviews the methods proposed by the participating teams, highlighting stateful models, spatial-temporal processing, computation and parameter efficiency, and event representations. Teams used various architectures, including GRU, ConvLSTM, BiLSTM, and Mamba, to handle event data. Computation and parameter efficiency were emphasized, with some teams implementing sparse convolution and temporal causal layers for efficient inference. The survey includes detailed descriptions of the best challenge solutions, such as the MambaPupil method by USTCEventGroup, the CETM by FreeEvs, the lightweight spatio-temporal network by bigBrains, the FPGA-based system by Go Sparse, the memory channel-based approach by MeMo, the Efficient Recurrent Vision Transformer (ERVT) by ERVT, and the Efficient Point-based Eye Tracking Method by EFFICIENT. Each team's methodology, implementation details, and results are discussed in detail. The survey concludes with insights from the challenge, emphasizing the emerging nature of event-based visual processing, the importance of hardware consideration, and the feasibility of using event cameras for eye-tracking tasks. It also highlights the need for prototyping and more realistic settings to advance event-based eye tracking systems.This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge, which focuses on processing eye movements recorded with event cameras and predicting the pupil center. The challenge emphasizes efficient eye tracking to achieve a good balance between accuracy and efficiency. During the challenge, 38 participants registered for the Kaggle competition, and 8 teams submitted detailed fact sheets. The survey reviews and analyzes the novel and diverse methods submitted to advance future event-based eye tracking research. The development of augmented reality (AR) and virtual reality (VR) technologies has increased the demand for precise and efficient eye-tracking systems. Eye-tracking and related tasks have significant potential in wearable healthcare technology, offering new approaches for diagnosing and monitoring conditions like Parkinson’s and Alzheimer’s diseases through eye movement patterns. Event cameras, or Dynamic Vision Sensors (DVS), provide a unique sensory modality for eye-tracking applications on mobile devices. Unlike traditional cameras, event cameras asynchronously record intensity changes exceeding a threshold, resulting in sparse spatiotemporal event streams. This sparsity can significantly reduce computation and energy demands, making event cameras suitable for mobile platforms. The challenge aims to explore algorithms for event-based eye tracking using the 3ET+ dataset, which contains real events recorded with a DVXplorer Mini event camera. The dataset includes 13 subjects performing five classes of activities: random, saccades, read text, smooth pursuit, and blinks. The primary evaluation metric is p-accuracy, with a tolerance of 10 pixels. The challenge is divided into three phases: preparation, the Kaggle competition, and submission of factsheets. The survey reviews the methods proposed by the participating teams, highlighting stateful models, spatial-temporal processing, computation and parameter efficiency, and event representations. Teams used various architectures, including GRU, ConvLSTM, BiLSTM, and Mamba, to handle event data. Computation and parameter efficiency were emphasized, with some teams implementing sparse convolution and temporal causal layers for efficient inference. The survey includes detailed descriptions of the best challenge solutions, such as the MambaPupil method by USTCEventGroup, the CETM by FreeEvs, the lightweight spatio-temporal network by bigBrains, the FPGA-based system by Go Sparse, the memory channel-based approach by MeMo, the Efficient Recurrent Vision Transformer (ERVT) by ERVT, and the Efficient Point-based Eye Tracking Method by EFFICIENT. Each team's methodology, implementation details, and results are discussed in detail. The survey concludes with insights from the challenge, emphasizing the emerging nature of event-based visual processing, the importance of hardware consideration, and the feasibility of using event cameras for eye-tracking tasks. It also highlights the need for prototyping and more realistic settings to advance event-based eye tracking systems.
Reach us at info@study.space
Understanding Event-Based Eye Tracking. AIS 2024 Challenge Survey