ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization

ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization

25 Apr 2024 | Mengsha Liu, Daoyuan Chen, Yaliang Li, Guian Fang, Ying Shen
ChartThinker is a novel approach to chart summarization that integrates the chain of thought (CoT) with context retrieval to enhance the logical coherence and accuracy of generated summaries. The method is based on a large-scale dataset of 595,955 chart-caption pairs and 8 million instruction-question pairs, covering diverse visual styles and topics. The dataset is used to pre-train and fine-tune the model, improving its ability to match chart content and handle various chart types. The model includes a chart parsing module that extracts underlying data and combines it with prompts to generate text features. A Context-Enhanced CoT Generator module fuses thought chains with context retrieval, incorporating increased logic and contextual information during the generation process. The model is evaluated against eight baselines, demonstrating superior performance across seven metrics. Human evaluations also show that ChartThinker provides more accurate and coherent summaries. The model's performance is further validated through ablation studies, highlighting the effectiveness of each component. The study concludes that ChartThinker significantly improves chart summarization by leveraging CoT and context retrieval, offering a new approach for visual-language models in chart summarization tasks.ChartThinker is a novel approach to chart summarization that integrates the chain of thought (CoT) with context retrieval to enhance the logical coherence and accuracy of generated summaries. The method is based on a large-scale dataset of 595,955 chart-caption pairs and 8 million instruction-question pairs, covering diverse visual styles and topics. The dataset is used to pre-train and fine-tune the model, improving its ability to match chart content and handle various chart types. The model includes a chart parsing module that extracts underlying data and combines it with prompts to generate text features. A Context-Enhanced CoT Generator module fuses thought chains with context retrieval, incorporating increased logic and contextual information during the generation process. The model is evaluated against eight baselines, demonstrating superior performance across seven metrics. Human evaluations also show that ChartThinker provides more accurate and coherent summaries. The model's performance is further validated through ablation studies, highlighting the effectiveness of each component. The study concludes that ChartThinker significantly improves chart summarization by leveraging CoT and context retrieval, offering a new approach for visual-language models in chart summarization tasks.
Reach us at info@study.space