Understanding Math-LLaVA%3A Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

The paper "Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models" addresses the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset. This comprehensive dataset enhances both the breadth and depth of multimodal mathematical questions. The authors introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K, which significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista’s minitest split. Additionally, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. The research highlights the importance of dataset diversity and synthesis in advancing MLLMs’ mathematical reasoning abilities. The code and data are available at: <https://github.com/HZQ950419/Math-LLaVA>.The paper "Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models" addresses the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset. This comprehensive dataset enhances both the breadth and depth of multimodal mathematical questions. The authors introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K, which significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista’s minitest split. Additionally, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. The research highlights the importance of dataset diversity and synthesis in advancing MLLMs’ mathematical reasoning abilities. The code and data are available at: <https://github.com/HZQ950419/Math-LLaVA>.

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

26 Jun 2024 | Wenhao Shi1*, Zhiqiang Hu2*, Yi Bin3,4, Junhua Liu2, Yang Yang1 See-Kiong Ng*, Lidong Bing, Roy Ka-Wei Lee2

26 Jun 2024 | Wenhao Shi1, Zhiqiang Hu2, Yi Bin3,4, Junhua Liu2, Yang Yang1 See-Kiong Ng*, Lidong Bing, Roy Ka-Wei Lee2