TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

30 Apr 2024 | Yoonsik Kim, Moonbin Yim, Ka Yeon Song
The paper introduces TableVQA-Bench, a benchmark for evaluating table visual question-answering (TableVQA) capabilities. The benchmark is constructed by integrating pre-existing table question-answering (QA) and table structure recognition (TSR) datasets, addressing the lack of images and QA pairs in existing datasets. Images are sourced using a stylesheet or a proposed table rendering system, and QA pairs are generated using a large language model (LLM) trained on text-formatted tables. The benchmark includes 1,500 QA pairs across four domains: VWTQ, VWTQ-Syn, VTabFact, and FinTabNetQA. The performance of various multi-modal large language models (MLLMs) is evaluated, with GPT-4V achieving the highest accuracy. The study also highlights the importance of visual features and the role of the number of vision queries in TableVQA performance. Additionally, the paper compares MLLMs with their LLM backbones, finding that processing visual inputs is more challenging. The TableVQA-Bench and evaluation codes are available on GitHub.The paper introduces TableVQA-Bench, a benchmark for evaluating table visual question-answering (TableVQA) capabilities. The benchmark is constructed by integrating pre-existing table question-answering (QA) and table structure recognition (TSR) datasets, addressing the lack of images and QA pairs in existing datasets. Images are sourced using a stylesheet or a proposed table rendering system, and QA pairs are generated using a large language model (LLM) trained on text-formatted tables. The benchmark includes 1,500 QA pairs across four domains: VWTQ, VWTQ-Syn, VTabFact, and FinTabNetQA. The performance of various multi-modal large language models (MLLMs) is evaluated, with GPT-4V achieving the highest accuracy. The study also highlights the importance of visual features and the role of the number of vision queries in TableVQA performance. Additionally, the paper compares MLLMs with their LLM backbones, finding that processing visual inputs is more challenging. The TableVQA-Bench and evaluation codes are available on GitHub.
Reach us at info@study.space