InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

24 May 2024 | Huaiyuan Ying¹,²*, Shuo Zhang¹,³, Linyang Li³, Zhejian Zhou¹,⁴, Yunfan Shao¹,³, Zhaoye Fei¹,³, Yichuan Ma¹, Jiawei Hong¹,³, Kuikun Liu¹, Ziyi Wang¹, Yudong Wang¹, Zijian Wu¹,³, Shuaibin Li¹, Fengzhe Zhou¹, Hongwei Liu¹, Songyang Zhang¹, Wenwei Zhang¹, Hang Yan¹, Xipeng Qiu³, Jiayu Wang¹, Kai Chen¹, Dahua Lin¹
InternLM-Math is an open-source math large language model (LLM) developed by the Shanghai AI Laboratory and other institutions. It is based on the InternLM2-Base model and is pre-trained to enhance mathematical reasoning, verification, proof, and data augmentation capabilities. The model is unified in a seq2seq format, integrating chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter. InternLM-Math achieves state-of-the-art performance on various benchmarks, including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F, with a 30.3 score on the MiniF2F test set without fine-tuning. It also explores using LEAN for solving math problems and multi-task learning, showing the potential of LEAN as a unified platform for solving and proving in math. The model is trained on a diverse dataset including CC retrieved data, domain-specific data, and synthetic data. It is fine-tuned using supervised learning, incorporating problem augmentation, reward modeling, and code interpreters. InternLM-Math is capable of solving math problems, verifying reasoning paths, and proving mathematical statements. It also integrates reasoning interleaved with coding (RICO), allowing for more natural and effective problem-solving. The model demonstrates strong performance in math reasoning tasks, outperforming other models like Llemma and Minerva. It is also effective in code interpretation, using Python libraries to handle complex calculations. InternLM-Math is open-sourced, with models, codes, and data available on GitHub. The model's performance is evaluated on various benchmarks, showing its effectiveness in both informal and formal math reasoning. However, it also faces challenges such as false positives in math benchmarks, where the model may generate correct answers without proper reasoning. The model's performance is also sensitive to prompts and instructions, highlighting the need for further improvements in self-critique and process verification. Overall, InternLM-Math represents a significant step towards verifiable mathematical reasoning and has the potential for versatile math reasoning tasks.InternLM-Math is an open-source math large language model (LLM) developed by the Shanghai AI Laboratory and other institutions. It is based on the InternLM2-Base model and is pre-trained to enhance mathematical reasoning, verification, proof, and data augmentation capabilities. The model is unified in a seq2seq format, integrating chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter. InternLM-Math achieves state-of-the-art performance on various benchmarks, including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F, with a 30.3 score on the MiniF2F test set without fine-tuning. It also explores using LEAN for solving math problems and multi-task learning, showing the potential of LEAN as a unified platform for solving and proving in math. The model is trained on a diverse dataset including CC retrieved data, domain-specific data, and synthetic data. It is fine-tuned using supervised learning, incorporating problem augmentation, reward modeling, and code interpreters. InternLM-Math is capable of solving math problems, verifying reasoning paths, and proving mathematical statements. It also integrates reasoning interleaved with coding (RICO), allowing for more natural and effective problem-solving. The model demonstrates strong performance in math reasoning tasks, outperforming other models like Llemma and Minerva. It is also effective in code interpretation, using Python libraries to handle complex calculations. InternLM-Math is open-sourced, with models, codes, and data available on GitHub. The model's performance is evaluated on various benchmarks, showing its effectiveness in both informal and formal math reasoning. However, it also faces challenges such as false positives in math benchmarks, where the model may generate correct answers without proper reasoning. The model's performance is also sensitive to prompts and instructions, highlighting the need for further improvements in self-critique and process verification. Overall, InternLM-Math represents a significant step towards verifiable mathematical reasoning and has the potential for versatile math reasoning tasks.
Reach us at info@study.space
[slides] InternLM-Math%3A Open Math Large Language Models Toward Verifiable Reasoning | StudySpace