Understanding Mathify%3A Evaluating Large Language Models on Mathematical Problem Solving Tasks

The paper "Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks" by Avinash Anand explores the application of large language models (LLMs) in solving mathematical problems, a domain that has received relatively little attention in the evaluation of LLMs. The authors introduce a comprehensive mathematics dataset called "MathQuest," sourced from 11th and 12th standard NCERT textbooks, which includes a wide range of mathematical concepts and problems of varying complexities. They fine-tune three prominent LLMs—LLaMA-2, WizardMath, and MAMmoTH—using this dataset to evaluate their performance. The experiments reveal that MAMmoTH-13B outperforms the other models, achieving the highest accuracy in solving the presented mathematical problems. The study highlights the potential of LLMs in educational settings, particularly in providing tailored learning experiences and immediate feedback. However, the research also identifies limitations, such as the model's struggle with complex expressions involving nested brackets, which the authors plan to address in future work.The paper "Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks" by Avinash Anand explores the application of large language models (LLMs) in solving mathematical problems, a domain that has received relatively little attention in the evaluation of LLMs. The authors introduce a comprehensive mathematics dataset called "MathQuest," sourced from 11th and 12th standard NCERT textbooks, which includes a wide range of mathematical concepts and problems of varying complexities. They fine-tune three prominent LLMs—LLaMA-2, WizardMath, and MAMmoTH—using this dataset to evaluate their performance. The experiments reveal that MAMmoTH-13B outperforms the other models, achieving the highest accuracy in solving the presented mathematical problems. The study highlights the potential of LLMs in educational settings, particularly in providing tailored learning experiences and immediate feedback. However, the research also identifies limitations, such as the model's struggle with complex expressions involving nested brackets, which the authors plan to address in future work.

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

19 Apr 2024 | Avinash Anand, Mohit Gupta, Kritarth Prasad, Navya Singla, Sanjana Sanjeev, Jatin Kumar, Adarsh Raj Shivam, Rajiv Ratn Shah