OneChart is an innovative framework designed to enhance the extraction of structural information from charts and plots. It addresses the challenges posed by diverse chart styles, values, and texts, which even advanced large vision-language models (LVLMs) struggle to handle effectively. OneChart incorporates an autoregressive main body and introduces an auxiliary token placed at the beginning of the token sequence, along with an additional decoder. This auxiliary token optimizes numerical outputs through causal attention, improving the reliability of the model's predictions. The model also includes a self-evaluation mechanism that provides confidence scores for the generated content, enhancing its accuracy.
OneChart outperforms current state-of-the-art (SOTA) chart parsing models, such as DePlot, ChartVLM, and ChartAst, in terms of Average Precision (AP) for chart structural extraction across multiple public benchmarks. Despite having only 0.2 billion parameters, OneChart significantly outperforms these models, achieving a 19.1% to 29.4% improvement in AP for charts lacking numerical annotations. Additionally, OneChart enhances the accuracy of popular LVLMs (LLaVA-1.6) by 32.6% on the ChartQA benchmark.
The paper introduces the ChartY benchmark, which includes approximately 6K charts spanning various topics, types, and languages, providing a comprehensive platform for future research and evaluation. OneChart's contributions include:
1. **Introduction of OneChart**: A state-of-the-art chart-to-dict model that uses an auxiliary token to improve numerical value parsing.
2. **Creation of the ChartY Benchmark**: A standardized benchmark for chart-to-dict tasks, offering a wide array of topics, chart types, and languages.
3. **Experimental Results**: OneChart achieves SOTA performance in structural extraction, with significant improvements in accuracy and reliability.
The methodology behind OneChart is detailed in five key areas: Data Engine, Architecture, The Auxiliary Token, Training Process, and Inference. The paper also includes ablation studies and a comparison with state-of-the-art models, demonstrating the effectiveness of the proposed techniques. Overall, OneChart represents a substantial advancement in chart understanding and information extraction, with potential applications in various real-world scenarios.OneChart is an innovative framework designed to enhance the extraction of structural information from charts and plots. It addresses the challenges posed by diverse chart styles, values, and texts, which even advanced large vision-language models (LVLMs) struggle to handle effectively. OneChart incorporates an autoregressive main body and introduces an auxiliary token placed at the beginning of the token sequence, along with an additional decoder. This auxiliary token optimizes numerical outputs through causal attention, improving the reliability of the model's predictions. The model also includes a self-evaluation mechanism that provides confidence scores for the generated content, enhancing its accuracy.
OneChart outperforms current state-of-the-art (SOTA) chart parsing models, such as DePlot, ChartVLM, and ChartAst, in terms of Average Precision (AP) for chart structural extraction across multiple public benchmarks. Despite having only 0.2 billion parameters, OneChart significantly outperforms these models, achieving a 19.1% to 29.4% improvement in AP for charts lacking numerical annotations. Additionally, OneChart enhances the accuracy of popular LVLMs (LLaVA-1.6) by 32.6% on the ChartQA benchmark.
The paper introduces the ChartY benchmark, which includes approximately 6K charts spanning various topics, types, and languages, providing a comprehensive platform for future research and evaluation. OneChart's contributions include:
1. **Introduction of OneChart**: A state-of-the-art chart-to-dict model that uses an auxiliary token to improve numerical value parsing.
2. **Creation of the ChartY Benchmark**: A standardized benchmark for chart-to-dict tasks, offering a wide array of topics, chart types, and languages.
3. **Experimental Results**: OneChart achieves SOTA performance in structural extraction, with significant improvements in accuracy and reliability.
The methodology behind OneChart is detailed in five key areas: Data Engine, Architecture, The Auxiliary Token, Training Process, and Inference. The paper also includes ablation studies and a comparison with state-of-the-art models, demonstrating the effectiveness of the proposed techniques. Overall, OneChart represents a substantial advancement in chart understanding and information extraction, with potential applications in various real-world scenarios.