ChatGPT: The End of Online Exam Integrity?

ChatGPT: The End of Online Exam Integrity?

17 June 2024 | Teo Susnjak, Timothy R. McIntosh
This study explores the impact of Large Language Models (LLMs) like ChatGPT on the integrity of online examinations, focusing on their ability to undermine academic honesty through advanced reasoning capabilities. An iterative self-reflective strategy was developed to enhance critical thinking and higher-order reasoning in LLMs when responding to complex multimodal exam questions involving both visual and textual data. The strategy was tested on real exam questions by subject experts and evaluated on a dataset of 600 text descriptions of multimodal exam questions. Results indicate that the self-reflective strategy can invoke latent multi-hop reasoning in LLMs, guiding them toward correct answers by integrating critical thinking from each modality. ChatGPT demonstrated proficiency in answering multimodal exam questions across 12 subjects, challenging prior assertions about LLM limitations in multimodal reasoning and emphasizing the need for robust online exam security measures. The study also highlights the challenges in detecting AI-generated text, the phenomenon of LLM hallucination, and the evolving debates about LLM reasoning capabilities. While LLMs can produce fluent and persuasive text, their tendency to fabricate information poses a risk for academic dishonesty. However, the study shows that LLMs can be prompted to self-critique and improve their reasoning through advanced strategies. The research demonstrates that LLMs can perform complex reasoning tasks across multiple modalities, challenging the assumption that they are limited in this area. The study proposes a multimodal self-reflective strategy to enhance LLM reasoning by decomposing complex tasks into sub-tasks for each modality. This approach was tested on real exam questions and showed that LLMs can be guided toward correct answers through self-reflection. The study also evaluates GPT-4V's proficiency in answering multimodal exam questions across 12 subjects, revealing that GPT-4V excels in subjects requiring interpretative flexibility and narrative construction, while facing challenges in fields demanding high precision and empirical rigor. The study concludes that while LLMs have significant reasoning capabilities, there is a need for robust online exam security measures such as advanced proctoring systems and more sophisticated multimodal exam questions to mitigate potential academic misconduct enabled by AI technologies. The study recommends proctored online exams, reinstatement of viva-voce exams, and enhanced multimodal exam strategies to enhance the integrity and effectiveness of online assessments.This study explores the impact of Large Language Models (LLMs) like ChatGPT on the integrity of online examinations, focusing on their ability to undermine academic honesty through advanced reasoning capabilities. An iterative self-reflective strategy was developed to enhance critical thinking and higher-order reasoning in LLMs when responding to complex multimodal exam questions involving both visual and textual data. The strategy was tested on real exam questions by subject experts and evaluated on a dataset of 600 text descriptions of multimodal exam questions. Results indicate that the self-reflective strategy can invoke latent multi-hop reasoning in LLMs, guiding them toward correct answers by integrating critical thinking from each modality. ChatGPT demonstrated proficiency in answering multimodal exam questions across 12 subjects, challenging prior assertions about LLM limitations in multimodal reasoning and emphasizing the need for robust online exam security measures. The study also highlights the challenges in detecting AI-generated text, the phenomenon of LLM hallucination, and the evolving debates about LLM reasoning capabilities. While LLMs can produce fluent and persuasive text, their tendency to fabricate information poses a risk for academic dishonesty. However, the study shows that LLMs can be prompted to self-critique and improve their reasoning through advanced strategies. The research demonstrates that LLMs can perform complex reasoning tasks across multiple modalities, challenging the assumption that they are limited in this area. The study proposes a multimodal self-reflective strategy to enhance LLM reasoning by decomposing complex tasks into sub-tasks for each modality. This approach was tested on real exam questions and showed that LLMs can be guided toward correct answers through self-reflection. The study also evaluates GPT-4V's proficiency in answering multimodal exam questions across 12 subjects, revealing that GPT-4V excels in subjects requiring interpretative flexibility and narrative construction, while facing challenges in fields demanding high precision and empirical rigor. The study concludes that while LLMs have significant reasoning capabilities, there is a need for robust online exam security measures such as advanced proctoring systems and more sophisticated multimodal exam questions to mitigate potential academic misconduct enabled by AI technologies. The study recommends proctored online exams, reinstatement of viva-voce exams, and enhanced multimodal exam strategies to enhance the integrity and effectiveness of online assessments.
Reach us at info@study.space