[slides] ChartInstruct%3A Instruction Tuning for Chart Comprehension and Reasoning

**ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning** **Authors:** Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty **Institution:** York University, Qatar Computing Research Institute (QCRI), Salesforce Research, Nanyang Technological University **Abstract:** Charts are widely used for data analysis and information conveying. Recent tasks such as question-answering and summarization have emerged, requiring models to understand and reason over chart structures. Traditional models trained on vision tasks alone are limited in their ability to handle a wide range of chart-related tasks. To address this, the authors introduce *ChartInstruct*, a novel dataset of 191K instructions generated from 71K charts. They present two systems for instruction tuning: an end-to-end model connecting a vision encoder with a language model (LLM), and a pipeline model that extracts chart data tables before inputting them into the LLM. Experiments on four downstream tasks show state-of-the-art performance, demonstrating the effectiveness of instruction tuning in supporting a broad array of real-world chart comprehension and reasoning scenarios. **Contributions:** 1. A new instruction-following corpus with real-world charts and diverse tasks. 2. Two distinct systems for chart understanding tasks. 3. Extensive evaluations showing state-of-the-art performance on existing benchmarks and expanded applicability to new tasks. **Methods:** - **Dataset Collection:** The authors collect charts from public datasets and web sources, creating a diverse corpus called WebCharts. - **Instruction Data Generation:** They generate 191K instructions covering various chart comprehension and reasoning tasks, including summarization, QA, fact-checking, and novel tasks proposed by LLMs. - **Model Architectures:** Two systems are designed: an end-to-end model using the LLaVA architecture with a vision encoder and LLM, and a pipeline model that extracts data tables before inputting them into the LLM. **Results:** - **Evaluation:** The models achieve state-of-the-art performance on four downstream tasks: ChartQA, Chart2Text, OpenCQA, and ChartFC. - **Human Evaluation:** Human evaluations confirm the models' effectiveness in supporting a wide range of real-world chart comprehension and reasoning scenarios. **Limitations:** - The models still struggle with complex numerical questions and may produce factually incorrect statements in text generation tasks. **Ethics:** - The authors address ethical considerations in dataset collection and content filtering, emphasizing responsible use and caution against misuse.**ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning** **Authors:** Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty **Institution:** York University, Qatar Computing Research Institute (QCRI), Salesforce Research, Nanyang Technological University **Abstract:** Charts are widely used for data analysis and information conveying. Recent tasks such as question-answering and summarization have emerged, requiring models to understand and reason over chart structures. Traditional models trained on vision tasks alone are limited in their ability to handle a wide range of chart-related tasks. To address this, the authors introduce *ChartInstruct*, a novel dataset of 191K instructions generated from 71K charts. They present two systems for instruction tuning: an end-to-end model connecting a vision encoder with a language model (LLM), and a pipeline model that extracts chart data tables before inputting them into the LLM. Experiments on four downstream tasks show state-of-the-art performance, demonstrating the effectiveness of instruction tuning in supporting a broad array of real-world chart comprehension and reasoning scenarios. **Contributions:** 1. A new instruction-following corpus with real-world charts and diverse tasks. 2. Two distinct systems for chart understanding tasks. 3. Extensive evaluations showing state-of-the-art performance on existing benchmarks and expanded applicability to new tasks. **Methods:** - **Dataset Collection:** The authors collect charts from public datasets and web sources, creating a diverse corpus called WebCharts. - **Instruction Data Generation:** They generate 191K instructions covering various chart comprehension and reasoning tasks, including summarization, QA, fact-checking, and novel tasks proposed by LLMs. - **Model Architectures:** Two systems are designed: an end-to-end model using the LLaVA architecture with a vision encoder and LLM, and a pipeline model that extracts data tables before inputting them into the LLM. **Results:** - **Evaluation:** The models achieve state-of-the-art performance on four downstream tasks: ChartQA, Chart2Text, OpenCQA, and ChartFC. - **Human Evaluation:** Human evaluations confirm the models' effectiveness in supporting a wide range of real-world chart comprehension and reasoning scenarios. **Limitations:** - The models still struggle with complex numerical questions and may produce factually incorrect statements in text generation tasks. **Ethics:** - The authors address ethical considerations in dataset collection and content filtering, emphasizing responsible use and caution against misuse.

ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

14 Mar 2024 | Ahmed Masry*, Mehrad Shahmohammadi*, Md Rizwan Parvez, Enamul Hoque*, Shafiq Joty**

14 Mar 2024 | Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque*, Shafiq Joty**