MathScale is a method designed to enhance mathematical reasoning capabilities in large language models (LLMs) by generating high-quality synthetic data. Inspired by human cognitive processes in learning mathematics, MathScale first extracts topics and knowledge points from seed math questions, then constructs a concept graph to estimate connections between different concepts. This graph is used to generate new math questions, resulting in a large-scale dataset (MathScaleQA) containing two million question-answer pairs. To evaluate these models comprehensively, the authors introduce MwPbench, a benchmark that includes ten datasets covering K-12, college, and competition-level math problems. MathScale-7B, fine-tuned on MathScaleQA, achieves state-of-the-art performance on MwPbench, outperforming its peers by 42.9% in micro average accuracy and 43.7% in macro average accuracy. The method's scalability and effectiveness are demonstrated through experiments, showing that it can generate diverse and complex math problems, leading to improved mathematical reasoning capabilities in LLMs.MathScale is a method designed to enhance mathematical reasoning capabilities in large language models (LLMs) by generating high-quality synthetic data. Inspired by human cognitive processes in learning mathematics, MathScale first extracts topics and knowledge points from seed math questions, then constructs a concept graph to estimate connections between different concepts. This graph is used to generate new math questions, resulting in a large-scale dataset (MathScaleQA) containing two million question-answer pairs. To evaluate these models comprehensively, the authors introduce MwPbench, a benchmark that includes ten datasets covering K-12, college, and competition-level math problems. MathScale-7B, fine-tuned on MathScaleQA, achieves state-of-the-art performance on MwPbench, outperforming its peers by 42.9% in micro average accuracy and 43.7% in macro average accuracy. The method's scalability and effectiveness are demonstrated through experiments, showing that it can generate diverse and complex math problems, leading to improved mathematical reasoning capabilities in LLMs.