What Makes Math Word Problems Challenging for LLMs?

What Makes Math Word Problems Challenging for LLMs?

1 Apr 2024 | KV Aditya Srivatsa, Ekaterina Kochmar
This paper investigates the challenges faced by large language models (LLMs) in solving math word problems (MWP). The authors analyze the linguistic and mathematical characteristics of MWP and train feature-based classifiers to understand how these features impact the difficulty of MWP for LLMs. They hypothesize that the complexity of MWP for LLMs is influenced by the linguistic complexity of the questions, the conceptual complexity of the tasks, and the amount of real-world knowledge required. The study uses the GSM8K dataset, which contains a diverse set of MWP, and evaluates several LLMs, including Llama2, Mistral-7B, and MetaMath-13B. The results show that longer questions, more complex mathematical operations, and the need for extraneous information significantly impact the success rate of LLMs in solving MWP. The study also identifies feature importance and conducts ablation studies to understand the impact of different feature types. The findings suggest that while LLMs can solve some MWP, they struggle with problems that involve extensive reasoning, complex mathematical operations, and require a broad range of real-world knowledge.This paper investigates the challenges faced by large language models (LLMs) in solving math word problems (MWP). The authors analyze the linguistic and mathematical characteristics of MWP and train feature-based classifiers to understand how these features impact the difficulty of MWP for LLMs. They hypothesize that the complexity of MWP for LLMs is influenced by the linguistic complexity of the questions, the conceptual complexity of the tasks, and the amount of real-world knowledge required. The study uses the GSM8K dataset, which contains a diverse set of MWP, and evaluates several LLMs, including Llama2, Mistral-7B, and MetaMath-13B. The results show that longer questions, more complex mathematical operations, and the need for extraneous information significantly impact the success rate of LLMs in solving MWP. The study also identifies feature importance and conducts ablation studies to understand the impact of different feature types. The findings suggest that while LLMs can solve some MWP, they struggle with problems that involve extensive reasoning, complex mathematical operations, and require a broad range of real-world knowledge.
Reach us at info@study.space