9 May 2024 | Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li
This paper introduces a novel prompting strategy called Pedagogical Chain-of-Thought (PedCoT) to enable Large Language Models (LLMs) to identify mathematical reasoning mistakes. The approach is based on the Bloom Cognitive Model (BCM) and incorporates pedagogical principles to guide the design of prompts. PedCoT consists of three components: pedagogical principles for prompt (PPP) design, a two-stage interaction process (TIP), and grounded PedCoT prompts. The PPP design is aligned with the BCM's six levels of cognitive abilities, focusing on the lower three levels of learning ability. The TIP involves two stages: Regenerate and Extract-Compare, which help identify reasoning mistakes by comparing generated and extracted content. The method is evaluated on two public datasets containing math problems of varying difficulty levels. The results show that PedCoT significantly outperforms existing baselines in identifying mathematical reasoning mistakes, demonstrating the effectiveness of integrating pedagogical theories into prompt design. The study highlights the importance of domain knowledge in guiding LLMs to perform complex reasoning tasks and provides a foundation for automatic math answer grading. The findings suggest that LLMs can effectively identify mathematical reasoning mistakes when guided by pedagogical principles, offering a promising approach to mitigate hallucination in LLMs.This paper introduces a novel prompting strategy called Pedagogical Chain-of-Thought (PedCoT) to enable Large Language Models (LLMs) to identify mathematical reasoning mistakes. The approach is based on the Bloom Cognitive Model (BCM) and incorporates pedagogical principles to guide the design of prompts. PedCoT consists of three components: pedagogical principles for prompt (PPP) design, a two-stage interaction process (TIP), and grounded PedCoT prompts. The PPP design is aligned with the BCM's six levels of cognitive abilities, focusing on the lower three levels of learning ability. The TIP involves two stages: Regenerate and Extract-Compare, which help identify reasoning mistakes by comparing generated and extracted content. The method is evaluated on two public datasets containing math problems of varying difficulty levels. The results show that PedCoT significantly outperforms existing baselines in identifying mathematical reasoning mistakes, demonstrating the effectiveness of integrating pedagogical theories into prompt design. The study highlights the importance of domain knowledge in guiding LLMs to perform complex reasoning tasks and provides a foundation for automatic math answer grading. The findings suggest that LLMs can effectively identify mathematical reasoning mistakes when guided by pedagogical principles, offering a promising approach to mitigate hallucination in LLMs.