This study addresses the significant challenge posed by Large Language Models (LLMs) like ChatGPT on the integrity of online examinations, focusing on how these models can undermine academic honesty by demonstrating their advanced reasoning capabilities. The authors developed an iterative self-reflective strategy to invoke critical thinking and higher-order reasoning in LLMs when responding to complex multimodal exam questions involving both visual and textual data. The proposed strategy was evaluated on real exam questions by subject experts and the performance of ChatGPT (GPT-4) with vision was assessed on a dataset of 600 text descriptions of multimodal exam questions. The results indicate that the proposed strategy can effectively steer LLMs towards correct answers by integrating critical thinking from each modality into the final response. ChatGPT demonstrated considerable proficiency in answering multimodal exam questions across 12 subjects, challenging prior assertions about the limitations of LLMs in multimodal reasoning. The findings emphasize the need for robust online exam security measures, such as advanced proctoring systems and more sophisticated multimodal exam questions, to mitigate potential academic misconduct enabled by AI technologies. The study also provides recommendations for enhancing the integrity and effectiveness of online assessments in the context of the advanced reasoning capabilities of LLMs.This study addresses the significant challenge posed by Large Language Models (LLMs) like ChatGPT on the integrity of online examinations, focusing on how these models can undermine academic honesty by demonstrating their advanced reasoning capabilities. The authors developed an iterative self-reflective strategy to invoke critical thinking and higher-order reasoning in LLMs when responding to complex multimodal exam questions involving both visual and textual data. The proposed strategy was evaluated on real exam questions by subject experts and the performance of ChatGPT (GPT-4) with vision was assessed on a dataset of 600 text descriptions of multimodal exam questions. The results indicate that the proposed strategy can effectively steer LLMs towards correct answers by integrating critical thinking from each modality into the final response. ChatGPT demonstrated considerable proficiency in answering multimodal exam questions across 12 subjects, challenging prior assertions about the limitations of LLMs in multimodal reasoning. The findings emphasize the need for robust online exam security measures, such as advanced proctoring systems and more sophisticated multimodal exam questions, to mitigate potential academic misconduct enabled by AI technologies. The study also provides recommendations for enhancing the integrity and effectiveness of online assessments in the context of the advanced reasoning capabilities of LLMs.