This paper introduces MathQuest, a comprehensive mathematics dataset derived from NCERT textbooks for 11th and 12th standard, designed to evaluate the mathematical problem-solving capabilities of large language models (LLMs). The dataset includes a wide range of mathematical concepts and varying levels of complexity. The authors conducted fine-tuning experiments on three prominent LLMs: LLaMA2, WizardMath, and MAmmoTH. The results show that MAmmoTH-13B outperforms the other models in solving mathematical problems, establishing itself as a robust benchmark for NCERT mathematics problems.
Mathematical problem-solving requires not only understanding problem statements but also performing precise arithmetic calculations. Existing LLMs face challenges in solving complex math word problems, particularly those requiring intricate reasoning or domain-specific knowledge. The MathQuest dataset was created to address this gap by providing a diverse and complex set of mathematical problems. The dataset was augmented to increase its size and diversity, allowing for more comprehensive training of LLMs.
The authors evaluated the performance of the fine-tuned models on various datasets, including MathQuest, GSM-8K, DeepMind, NumGLUE, and SimulEq. The results indicate that MAmmoTH-13B achieves the highest accuracy, demonstrating its effectiveness in solving mathematical problems. The study highlights the importance of fine-tuning LLMs on specialized datasets to improve their performance in mathematical reasoning tasks.
The research contributes to the development of more effective LLMs for educational applications, particularly in mathematics. The findings suggest that with appropriate training and fine-tuning, LLMs can become valuable tools for enhancing mathematical education. However, the study also acknowledges the limitations of current LLMs in handling complex mathematical problems, emphasizing the need for further research and improvements in this area.This paper introduces MathQuest, a comprehensive mathematics dataset derived from NCERT textbooks for 11th and 12th standard, designed to evaluate the mathematical problem-solving capabilities of large language models (LLMs). The dataset includes a wide range of mathematical concepts and varying levels of complexity. The authors conducted fine-tuning experiments on three prominent LLMs: LLaMA2, WizardMath, and MAmmoTH. The results show that MAmmoTH-13B outperforms the other models in solving mathematical problems, establishing itself as a robust benchmark for NCERT mathematics problems.
Mathematical problem-solving requires not only understanding problem statements but also performing precise arithmetic calculations. Existing LLMs face challenges in solving complex math word problems, particularly those requiring intricate reasoning or domain-specific knowledge. The MathQuest dataset was created to address this gap by providing a diverse and complex set of mathematical problems. The dataset was augmented to increase its size and diversity, allowing for more comprehensive training of LLMs.
The authors evaluated the performance of the fine-tuned models on various datasets, including MathQuest, GSM-8K, DeepMind, NumGLUE, and SimulEq. The results indicate that MAmmoTH-13B achieves the highest accuracy, demonstrating its effectiveness in solving mathematical problems. The study highlights the importance of fine-tuning LLMs on specialized datasets to improve their performance in mathematical reasoning tasks.
The research contributes to the development of more effective LLMs for educational applications, particularly in mathematics. The findings suggest that with appropriate training and fine-tuning, LLMs can become valuable tools for enhancing mathematical education. However, the study also acknowledges the limitations of current LLMs in handling complex mathematical problems, emphasizing the need for further research and improvements in this area.