Enhancing Eye-Tracking Performance through Multi-Task Learning Transformer

Enhancing Eye-Tracking Performance through Multi-Task Learning Transformer

11 Aug 2024 | Weigeng Li, Neng Zhou, and Xiaodong Qu
This paper introduces an innovative EEG signal reconstruction sub-module to enhance the performance of deep learning models in EEG eye-tracking tasks. The sub-module is designed to integrate with Encoder-Classifier-based models and enables end-to-end training within a multi-task learning framework. It operates under unsupervised learning, making it versatile for various tasks. The effectiveness of the sub-module is demonstrated by incorporating it into advanced models, including Transformers and pre-trained Transformers. The results show a significant improvement in feature representation, evidenced by a Root Mean Squared Error (RMSE) of 54.1mm, which is a notable enhancement over existing methods. The sub-module enhances the feature extraction ability of the encoder by being integrated as a sub-task within the main task, maintaining the end-to-end training process of the original model. Unlike pre-training methods like autoencoders, this approach saves computational costs and offers greater flexibility in adapting to various model structures. The unsupervised nature of the sub-module allows it to be applied across diverse tasks. The paper explores the application of multi-task learning (MTL) in EEG tasks, highlighting its advantages in emotion recognition, classification, and disease prediction. The research questions focus on whether EEG signal reconstruction can enhance the Transformer encoder's feature-extracting ability and which aspects of prediction results improve after integrating the framework. The study combines MTL with Vision Transformers (ViTs) to enhance the performance of the EEGEyeNet dataset's eye-tracking task. The model architecture includes a multi-task framework that handles various sub-tasks simultaneously, leveraging the representation module to maintain dual capabilities in feature extraction. The model uses a convolutional layer and pre-trained Vision Transformer (ViT) encoder to capture complex patterns in data. The prediction module consists of fully connected layers, dropout layers, and outputs the final inference results. The reconstruction module uses spatial and temporal deconvolution blocks to reconstruct the input data effectively. The multi-task learning framework integrates losses from sub-tasks to enhance the training of the primary eye-tracking task. The experiments on the EEGEyeNet dataset show that the proposed model achieves an RMSE of 54.1mm, slightly surpassing the current state-of-the-art model's RMSE of 55.4mm. The results indicate the effectiveness of the model's architecture and methodologies in handling the complexities of EEG data in eye-tracking tasks. The study also discusses the implications of the research for EEG-based eye tracking, highlighting the potential for more precise and reliable systems in applications such as neuromarketing and neurological disorder diagnosis. The paper concludes that the integration of multi-task learning with Vision Transformers offers a promising approach for improving EEG-based eye-tracking systems.This paper introduces an innovative EEG signal reconstruction sub-module to enhance the performance of deep learning models in EEG eye-tracking tasks. The sub-module is designed to integrate with Encoder-Classifier-based models and enables end-to-end training within a multi-task learning framework. It operates under unsupervised learning, making it versatile for various tasks. The effectiveness of the sub-module is demonstrated by incorporating it into advanced models, including Transformers and pre-trained Transformers. The results show a significant improvement in feature representation, evidenced by a Root Mean Squared Error (RMSE) of 54.1mm, which is a notable enhancement over existing methods. The sub-module enhances the feature extraction ability of the encoder by being integrated as a sub-task within the main task, maintaining the end-to-end training process of the original model. Unlike pre-training methods like autoencoders, this approach saves computational costs and offers greater flexibility in adapting to various model structures. The unsupervised nature of the sub-module allows it to be applied across diverse tasks. The paper explores the application of multi-task learning (MTL) in EEG tasks, highlighting its advantages in emotion recognition, classification, and disease prediction. The research questions focus on whether EEG signal reconstruction can enhance the Transformer encoder's feature-extracting ability and which aspects of prediction results improve after integrating the framework. The study combines MTL with Vision Transformers (ViTs) to enhance the performance of the EEGEyeNet dataset's eye-tracking task. The model architecture includes a multi-task framework that handles various sub-tasks simultaneously, leveraging the representation module to maintain dual capabilities in feature extraction. The model uses a convolutional layer and pre-trained Vision Transformer (ViT) encoder to capture complex patterns in data. The prediction module consists of fully connected layers, dropout layers, and outputs the final inference results. The reconstruction module uses spatial and temporal deconvolution blocks to reconstruct the input data effectively. The multi-task learning framework integrates losses from sub-tasks to enhance the training of the primary eye-tracking task. The experiments on the EEGEyeNet dataset show that the proposed model achieves an RMSE of 54.1mm, slightly surpassing the current state-of-the-art model's RMSE of 55.4mm. The results indicate the effectiveness of the model's architecture and methodologies in handling the complexities of EEG data in eye-tracking tasks. The study also discusses the implications of the research for EEG-based eye tracking, highlighting the potential for more precise and reliable systems in applications such as neuromarketing and neurological disorder diagnosis. The paper concludes that the integration of multi-task learning with Vision Transformers offers a promising approach for improving EEG-based eye-tracking systems.
Reach us at info@study.space