14 Mar 2024 | Ahmed Masry*, Mehrad Shahmohammadi*, Md Rizwan Parvez, Enamul Hoque*, Shafiq Joty**
**ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning**
**Authors:** Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty
**Institution:** York University, Qatar Computing Research Institute (QCRI), Salesforce Research, Nanyang Technological University
**Abstract:**
Charts are widely used for data analysis and information conveying. Recent tasks such as question-answering and summarization have emerged, requiring models to understand and reason over chart structures. Traditional models trained on vision tasks alone are limited in their ability to handle a wide range of chart-related tasks. To address this, the authors introduce *ChartInstruct*, a novel dataset of 191K instructions generated from 71K charts. They present two systems for instruction tuning: an end-to-end model connecting a vision encoder with a language model (LLM), and a pipeline model that extracts chart data tables before inputting them into the LLM. Experiments on four downstream tasks show state-of-the-art performance, demonstrating the effectiveness of instruction tuning in supporting a broad array of real-world chart comprehension and reasoning scenarios.
**Contributions:**
1. A new instruction-following corpus with real-world charts and diverse tasks.
2. Two distinct systems for chart understanding tasks.
3. Extensive evaluations showing state-of-the-art performance on existing benchmarks and expanded applicability to new tasks.
**Methods:**
- **Dataset Collection:** The authors collect charts from public datasets and web sources, creating a diverse corpus called WebCharts.
- **Instruction Data Generation:** They generate 191K instructions covering various chart comprehension and reasoning tasks, including summarization, QA, fact-checking, and novel tasks proposed by LLMs.
- **Model Architectures:** Two systems are designed: an end-to-end model using the LLaVA architecture with a vision encoder and LLM, and a pipeline model that extracts data tables before inputting them into the LLM.
**Results:**
- **Evaluation:** The models achieve state-of-the-art performance on four downstream tasks: ChartQA, Chart2Text, OpenCQA, and ChartFC.
- **Human Evaluation:** Human evaluations confirm the models' effectiveness in supporting a wide range of real-world chart comprehension and reasoning scenarios.
**Limitations:**
- The models still struggle with complex numerical questions and may produce factually incorrect statements in text generation tasks.
**Ethics:**
- The authors address ethical considerations in dataset collection and content filtering, emphasizing responsible use and caution against misuse.**ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning**
**Authors:** Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty
**Institution:** York University, Qatar Computing Research Institute (QCRI), Salesforce Research, Nanyang Technological University
**Abstract:**
Charts are widely used for data analysis and information conveying. Recent tasks such as question-answering and summarization have emerged, requiring models to understand and reason over chart structures. Traditional models trained on vision tasks alone are limited in their ability to handle a wide range of chart-related tasks. To address this, the authors introduce *ChartInstruct*, a novel dataset of 191K instructions generated from 71K charts. They present two systems for instruction tuning: an end-to-end model connecting a vision encoder with a language model (LLM), and a pipeline model that extracts chart data tables before inputting them into the LLM. Experiments on four downstream tasks show state-of-the-art performance, demonstrating the effectiveness of instruction tuning in supporting a broad array of real-world chart comprehension and reasoning scenarios.
**Contributions:**
1. A new instruction-following corpus with real-world charts and diverse tasks.
2. Two distinct systems for chart understanding tasks.
3. Extensive evaluations showing state-of-the-art performance on existing benchmarks and expanded applicability to new tasks.
**Methods:**
- **Dataset Collection:** The authors collect charts from public datasets and web sources, creating a diverse corpus called WebCharts.
- **Instruction Data Generation:** They generate 191K instructions covering various chart comprehension and reasoning tasks, including summarization, QA, fact-checking, and novel tasks proposed by LLMs.
- **Model Architectures:** Two systems are designed: an end-to-end model using the LLaVA architecture with a vision encoder and LLM, and a pipeline model that extracts data tables before inputting them into the LLM.
**Results:**
- **Evaluation:** The models achieve state-of-the-art performance on four downstream tasks: ChartQA, Chart2Text, OpenCQA, and ChartFC.
- **Human Evaluation:** Human evaluations confirm the models' effectiveness in supporting a wide range of real-world chart comprehension and reasoning scenarios.
**Limitations:**
- The models still struggle with complex numerical questions and may produce factually incorrect statements in text generation tasks.
**Ethics:**
- The authors address ethical considerations in dataset collection and content filtering, emphasizing responsible use and caution against misuse.