2024-10-11 | Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang
The paper introduces TabPedia, a novel large vision-language model designed to comprehensively understand visual tables. TabPedia incorporates a *concept synergy* mechanism, which abstracts various visual table understanding (VTU) tasks and multi-source visual embeddings as concepts. This mechanism allows TabPedia to seamlessly integrate tasks such as table detection, structure recognition, querying, and question answering, leveraging the capabilities of large language models (LLMs). The concept synergy enables table perception and comprehension tasks to work harmoniously, effectively utilizing clues from corresponding source perception embeddings. To evaluate VTU tasks in real-world scenarios, the authors establish the ComTQA benchmark, featuring approximately 9,000 QA pairs. Extensive experiments on various public benchmarks validate the effectiveness of TabPedia, demonstrating its superior performance in understanding visual tables when all concepts work in synergy. The benchmark and source code are open-sourced to promote further research and development.The paper introduces TabPedia, a novel large vision-language model designed to comprehensively understand visual tables. TabPedia incorporates a *concept synergy* mechanism, which abstracts various visual table understanding (VTU) tasks and multi-source visual embeddings as concepts. This mechanism allows TabPedia to seamlessly integrate tasks such as table detection, structure recognition, querying, and question answering, leveraging the capabilities of large language models (LLMs). The concept synergy enables table perception and comprehension tasks to work harmoniously, effectively utilizing clues from corresponding source perception embeddings. To evaluate VTU tasks in real-world scenarios, the authors establish the ComTQA benchmark, featuring approximately 9,000 QA pairs. Extensive experiments on various public benchmarks validate the effectiveness of TabPedia, demonstrating its superior performance in understanding visual tables when all concepts work in synergy. The benchmark and source code are open-sourced to promote further research and development.