ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

14 Mar 2024 | Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty
ChartInstruct is a novel chart-specific vision-language instruction-following dataset comprising 191,000 instructions generated with 71,000 charts. The dataset is designed to enhance the ability of vision-language models (VLMs) to understand and reason about charts. Two systems are introduced for instruction tuning on this dataset: an end-to-end model that connects a vision encoder for chart understanding with a large language model (LLM), and a pipeline model that extracts chart data tables and inputs them into the LLM. The dataset includes a wide range of chart-related tasks, such as chart summarization, question answering, fact-checking, and reasoning. The end-to-end system modifies the LLaVA architecture by replacing its CLIP vision encoder with the UniChart vision encoder pre-trained on chart images. The pipeline system first extracts the underlying data table from the chart image and then provides it as input to the LLM. The models are evaluated on four downstream tasks, achieving state-of-the-art results. Human evaluation further confirms the effectiveness of the instruction-tuning approach in supporting a wide array of real-world chart comprehension and reasoning scenarios. The dataset and models are publicly available for further research. The paper also discusses related work, including chart modeling, visual instruction tuning, and chart domain downstream tasks. The results show that the models outperform existing baselines in chart understanding and reasoning tasks. The paper also presents an error analysis and challenges in the model's performance, highlighting the need for further improvements in numerical reasoning and factual accuracy. The authors emphasize the importance of ethical considerations in the dataset collection process and the potential risks of misuse of the models. The paper concludes that ChartInstruct is a valuable resource for future research in chart understanding and reasoning.ChartInstruct is a novel chart-specific vision-language instruction-following dataset comprising 191,000 instructions generated with 71,000 charts. The dataset is designed to enhance the ability of vision-language models (VLMs) to understand and reason about charts. Two systems are introduced for instruction tuning on this dataset: an end-to-end model that connects a vision encoder for chart understanding with a large language model (LLM), and a pipeline model that extracts chart data tables and inputs them into the LLM. The dataset includes a wide range of chart-related tasks, such as chart summarization, question answering, fact-checking, and reasoning. The end-to-end system modifies the LLaVA architecture by replacing its CLIP vision encoder with the UniChart vision encoder pre-trained on chart images. The pipeline system first extracts the underlying data table from the chart image and then provides it as input to the LLM. The models are evaluated on four downstream tasks, achieving state-of-the-art results. Human evaluation further confirms the effectiveness of the instruction-tuning approach in supporting a wide array of real-world chart comprehension and reasoning scenarios. The dataset and models are publicly available for further research. The paper also discusses related work, including chart modeling, visual instruction tuning, and chart domain downstream tasks. The results show that the models outperform existing baselines in chart understanding and reasoning tasks. The paper also presents an error analysis and challenges in the model's performance, highlighting the need for further improvements in numerical reasoning and factual accuracy. The authors emphasize the importance of ethical considerations in the dataset collection process and the potential risks of misuse of the models. The paper concludes that ChartInstruct is a valuable resource for future research in chart understanding and reasoning.
Reach us at info@study.space
[slides and audio] ChartInstruct%3A Instruction Tuning for Chart Comprehension and Reasoning