20 May 2024 | Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora
The paper explores the metacognitive capabilities of large language models (LLMs) in mathematical problem-solving. It demonstrates that LLMs possess metacognitive knowledge, including the ability to identify and apply specific skills to tasks. The authors develop a method to guide LLMs to assign meaningful skill labels to math questions and perform semantic clustering to obtain coarser skill labels that are more interpretable for humans. They validate the effectiveness of these skill labels through experiments using the GPT-4 model on math datasets GSM8k and MATH. The results show significant improvements in problem-solving accuracy, particularly when the LLMs are provided with in-context examples related to the identified skills. The methodology is domain-agnostic and can be applied to other problem-solving tasks. The paper also discusses the transferability of these skill labels to weaker LLMs and the potential for further enhancing LLM capabilities through fine-tuning processes.The paper explores the metacognitive capabilities of large language models (LLMs) in mathematical problem-solving. It demonstrates that LLMs possess metacognitive knowledge, including the ability to identify and apply specific skills to tasks. The authors develop a method to guide LLMs to assign meaningful skill labels to math questions and perform semantic clustering to obtain coarser skill labels that are more interpretable for humans. They validate the effectiveness of these skill labels through experiments using the GPT-4 model on math datasets GSM8k and MATH. The results show significant improvements in problem-solving accuracy, particularly when the LLMs are provided with in-context examples related to the identified skills. The methodology is domain-agnostic and can be applied to other problem-solving tasks. The paper also discusses the transferability of these skill labels to weaker LLMs and the potential for further enhancing LLM capabilities through fine-tuning processes.