January 31, 2024 | Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
This study examines whether generative artificial intelligence (GAI) tools, specifically ChatGPT and GPT-4, can outperform humans in solving cognitively demanding problem-solving tasks in science. The research focuses on the National Assessment of Educational Progress (NAEP) science assessments for grades 4, 8, and 12, coding the tasks using a two-dimensional cognitive load framework. The results show that both ChatGPT and GPT-4 consistently outperform most students on individual items, with agreement rates of 90%, 75%, and 94% for grades 4, 8, and 12, respectively. However, their performance is not significantly sensitive to the cognitive demand of the tasks, except for Grade 4. The findings suggest that higher average student ability scores are required to correctly address questions with increasing cognitive demand. The study implications include the need for educational practices to emphasize advanced cognitive skills, such as critical thinking and creativity, rather than solely focusing on cognitive intensity. Additionally, researchers should innovate assessment practices to move away from cognitive intensity tasks toward creativity and analytical skills. The study highlights the potential of GAI in enhancing educational outcomes but also underscores the importance of ethical and professional development considerations in integrating GAI into educational settings.This study examines whether generative artificial intelligence (GAI) tools, specifically ChatGPT and GPT-4, can outperform humans in solving cognitively demanding problem-solving tasks in science. The research focuses on the National Assessment of Educational Progress (NAEP) science assessments for grades 4, 8, and 12, coding the tasks using a two-dimensional cognitive load framework. The results show that both ChatGPT and GPT-4 consistently outperform most students on individual items, with agreement rates of 90%, 75%, and 94% for grades 4, 8, and 12, respectively. However, their performance is not significantly sensitive to the cognitive demand of the tasks, except for Grade 4. The findings suggest that higher average student ability scores are required to correctly address questions with increasing cognitive demand. The study implications include the need for educational practices to emphasize advanced cognitive skills, such as critical thinking and creativity, rather than solely focusing on cognitive intensity. Additionally, researchers should innovate assessment practices to move away from cognitive intensity tasks toward creativity and analytical skills. The study highlights the potential of GAI in enhancing educational outcomes but also underscores the importance of ethical and professional development considerations in integrating GAI into educational settings.