Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

19 Mar 2024 | Victor Cărbune*, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma
The paper "Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs" by Victor Carbune et al. addresses the challenge of improving reasoning capabilities in Vision-Language Models (VLMs) for multi-modal tasks. The authors propose a technique to transfer capabilities from Large Language Models (LLMs) to VLMs, specifically focusing on the ChartQA benchmark. They improve the chart representation by extending the pre-training stage with an enhanced chart-to-table translation task and constructing a larger dataset. To enhance general reasoning and numerical operations, they synthesize reasoning traces using the table representation of charts. The model is fine-tuned using a multitask loss function. The resulting model, ChartPaLI-5B, outperforms existing models on ChartQA, FigureQA, and PlotQA, even with 10x fewer parameters. The paper also discusses the impact of different techniques and provides ablation studies to quantify their effectiveness. The authors highlight the importance of internal representations and the use of synthetic data for improving VLMs' reasoning capabilities.The paper "Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs" by Victor Carbune et al. addresses the challenge of improving reasoning capabilities in Vision-Language Models (VLMs) for multi-modal tasks. The authors propose a technique to transfer capabilities from Large Language Models (LLMs) to VLMs, specifically focusing on the ChartQA benchmark. They improve the chart representation by extending the pre-training stage with an enhanced chart-to-table translation task and constructing a larger dataset. To enhance general reasoning and numerical operations, they synthesize reasoning traces using the table representation of charts. The model is fine-tuned using a multitask loss function. The resulting model, ChartPaLI-5B, outperforms existing models on ChartQA, FigureQA, and PlotQA, even with 10x fewer parameters. The paper also discusses the impact of different techniques and provides ablation studies to quantify their effectiveness. The authors highlight the importance of internal representations and the use of synthetic data for improving VLMs' reasoning capabilities.
Reach us at info@study.space
[slides and audio] Chart-based Reasoning%3A Transferring Capabilities from LLMs to VLMs