20 May 2024 | Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora
This paper explores the metacognitive capabilities of large language models (LLMs) in mathematical problem solving. It presents a method to extract metacognitive knowledge from LLMs by identifying and clustering mathematical skills. The approach involves using a powerful LLM to label mathematical questions with specific skills, followed by semantic clustering to group similar skills into broader categories. This results in a "Skill Exemplar Repository" containing skill names and corresponding question-answer examples. The repository is then used to improve the performance of LLMs by providing relevant in-context examples during problem-solving.
The study validates the effectiveness of this approach by showing that using skill labels and exemplars from the repository significantly improves the accuracy of LLMs on mathematical datasets such as GSM8K and MATH. The method is domain-agnostic and has been applied to math problems, but its principles could be extended to other problem-solving domains. The paper also demonstrates that skill-based in-context examples, generated by a strong LLM, can enhance the performance of weaker LLMs. Additionally, the approach is shown to be compatible with various prompting strategies, including Chain-of-Thought (CoT) and Program-Aided Language Models (PALs), and improves their effectiveness by providing skill-aligned examples.
Experiments show that the Skill-Based approach outperforms traditional methods in terms of accuracy and performance across different mathematical datasets. The method is particularly effective in reducing main skill errors and improving the application of secondary skills. The paper concludes that the proposed approach provides a valuable framework for extracting metacognitive knowledge from LLMs, which can be used to enhance their problem-solving capabilities in various domains. The findings suggest that using skills to fine-tune models may further improve their capabilities, indicating a potential path for bootstrapping model capabilities beyond mathematics.This paper explores the metacognitive capabilities of large language models (LLMs) in mathematical problem solving. It presents a method to extract metacognitive knowledge from LLMs by identifying and clustering mathematical skills. The approach involves using a powerful LLM to label mathematical questions with specific skills, followed by semantic clustering to group similar skills into broader categories. This results in a "Skill Exemplar Repository" containing skill names and corresponding question-answer examples. The repository is then used to improve the performance of LLMs by providing relevant in-context examples during problem-solving.
The study validates the effectiveness of this approach by showing that using skill labels and exemplars from the repository significantly improves the accuracy of LLMs on mathematical datasets such as GSM8K and MATH. The method is domain-agnostic and has been applied to math problems, but its principles could be extended to other problem-solving domains. The paper also demonstrates that skill-based in-context examples, generated by a strong LLM, can enhance the performance of weaker LLMs. Additionally, the approach is shown to be compatible with various prompting strategies, including Chain-of-Thought (CoT) and Program-Aided Language Models (PALs), and improves their effectiveness by providing skill-aligned examples.
Experiments show that the Skill-Based approach outperforms traditional methods in terms of accuracy and performance across different mathematical datasets. The method is particularly effective in reducing main skill errors and improving the application of secondary skills. The paper concludes that the proposed approach provides a valuable framework for extracting metacognitive knowledge from LLMs, which can be used to enhance their problem-solving capabilities in various domains. The findings suggest that using skills to fine-tune models may further improve their capabilities, indicating a potential path for bootstrapping model capabilities beyond mathematics.