17 Jun 2024 | Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng
The paper introduces a novel pipeline, GeoGPT4V, designed to enhance the geometric capabilities of multi-modal large language models (MLLMs). The pipeline leverages GPT-4V and GPT-4 to generate simplified geometry problems with aligned text and images, addressing the challenge of effectively using image information in geometry problems. The authors create a dataset of 4.9K geometry problems, which is combined with 19K open-source data to form the GeoGPT4V dataset. Experimental results on the MathVista and MathVision benchmarks demonstrate that the GeoGPT4V dataset significantly improves the geometric performance of various models, achieving relative improvements of 58.2% and 33.8% for LLaVA-1.5-7B and ShareGPT4V-7B, respectively. The paper also includes a detailed analysis of the effectiveness of the generated images and the necessity of scoring and filtering the images. The GeoGPT4V dataset and model checkpoints are open-sourced to facilitate further research and development.The paper introduces a novel pipeline, GeoGPT4V, designed to enhance the geometric capabilities of multi-modal large language models (MLLMs). The pipeline leverages GPT-4V and GPT-4 to generate simplified geometry problems with aligned text and images, addressing the challenge of effectively using image information in geometry problems. The authors create a dataset of 4.9K geometry problems, which is combined with 19K open-source data to form the GeoGPT4V dataset. Experimental results on the MathVista and MathVision benchmarks demonstrate that the GeoGPT4V dataset significantly improves the geometric performance of various models, achieving relative improvements of 58.2% and 33.8% for LLaVA-1.5-7B and ShareGPT4V-7B, respectively. The paper also includes a detailed analysis of the effectiveness of the generated images and the necessity of scoring and filtering the images. The GeoGPT4V dataset and model checkpoints are open-sourced to facilitate further research and development.