26 Oct 2024 | Hyejeong Jo, Yiqian Yang, Juhyeok Han, Yiqun Duan, Hui Xiong, Won Hee Lee
This paper critically evaluates existing EEG-to-Text models, highlighting a major limitation: previous studies often used implicit teacher-forcing during evaluation, artificially inflating performance metrics. They also lacked a critical benchmark by not comparing model performance on pure noise inputs. The authors propose a methodology to distinguish between models that genuinely learn from EEG signals and those that merely memorize training data. Their analysis shows that models perform similarly on noise data as on EEG data, raising concerns about true learning capabilities. This underscores the need for stricter evaluation practices, emphasizing transparent reporting and rigorous benchmarking with noise inputs. The paper introduces a novel methodology to assess whether EEG-to-Text models are genuinely learning from EEG inputs or simply memorizing patterns. It also highlights the importance of improved transparency and rigorous benchmarking in EEG-to-Text research. The study uses the ZuCo datasets, which include EEG signals and eye-tracking data collected during natural reading tasks. The models were trained on Nvidia RTX 4090 GPUs with a batch size of 32 and a learning rate of 2e-5 for 30 epochs. Evaluation metrics include BLEU, ROUGE, and WER. The results show that teacher-forcing significantly boosts performance, but models trained on EEG or random input data show consistent performance, suggesting they may be memorizing patterns rather than learning from EEG data. The paper advocates for stricter evaluation practices in EEG-to-Text research to ensure models are assessed based on their ability to learn from EEG data itself. The findings emphasize the need for more rigorous evaluation methodologies that avoid relying on teacher-forcing and ensure models are assessed based on their ability to learn from EEG data itself.This paper critically evaluates existing EEG-to-Text models, highlighting a major limitation: previous studies often used implicit teacher-forcing during evaluation, artificially inflating performance metrics. They also lacked a critical benchmark by not comparing model performance on pure noise inputs. The authors propose a methodology to distinguish between models that genuinely learn from EEG signals and those that merely memorize training data. Their analysis shows that models perform similarly on noise data as on EEG data, raising concerns about true learning capabilities. This underscores the need for stricter evaluation practices, emphasizing transparent reporting and rigorous benchmarking with noise inputs. The paper introduces a novel methodology to assess whether EEG-to-Text models are genuinely learning from EEG inputs or simply memorizing patterns. It also highlights the importance of improved transparency and rigorous benchmarking in EEG-to-Text research. The study uses the ZuCo datasets, which include EEG signals and eye-tracking data collected during natural reading tasks. The models were trained on Nvidia RTX 4090 GPUs with a batch size of 32 and a learning rate of 2e-5 for 30 epochs. Evaluation metrics include BLEU, ROUGE, and WER. The results show that teacher-forcing significantly boosts performance, but models trained on EEG or random input data show consistent performance, suggesting they may be memorizing patterns rather than learning from EEG data. The paper advocates for stricter evaluation practices in EEG-to-Text research to ensure models are assessed based on their ability to learn from EEG data itself. The findings emphasize the need for more rigorous evaluation methodologies that avoid relying on teacher-forcing and ensure models are assessed based on their ability to learn from EEG data itself.