Large Language Model for Table Processing: A Survey

Large Language Model for Table Processing: A Survey

2024 | Weizheng Lu, Jing Zhang, Ju Fan, Zihao Fu, Yueguo Chen, Xiaoyong Du
This survey provides a comprehensive overview of table-related tasks, examining both user scenarios and technical aspects. It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis. The paper summarizes training techniques for LLMs and VLMs tailored for table processing, discusses prompt engineering, particularly the use of LLM-powered agents, and highlights challenges such as processing implicit user intentions and extracting information from various table sources. Tables are essential in daily activities like database queries, spreadsheet manipulations, web table question answering, and image table information extraction. Automating these tasks with LLMs or VLMs offers significant public benefits. The paper discusses the unique challenges of table processing, including structured data, complex reasoning, and the need to integrate external tools. It categorizes methods based on the latest paradigms in LLM usage, focusing on instruction-tuning and LLM-powered agent approaches. The paper outlines four types of tables: spreadsheet, web table, database, and document. It discusses the differences between tables and text, highlighting the two-dimensional structure of tables and their reliance on schemas. Table tasks include table QA, fact verification, data cleaning, and data analysis. The paper also covers the data lifecycle, including data entry, cleaning, CRUD operations, analysis, and visualization. The paper discusses table data representation, including text and visual representations. It explores training techniques for LLMs and VLMs, including pre-LLM era methods, instruction tuning, code tuning, and hybrid approaches. It also discusses prompting strategies for LLMs, including the use of LLM-powered agents, and highlights challenges such as cost, accuracy, and privacy issues. The paper summarizes open-source datasets, benchmarks, and software, which can facilitate the community's progress. It highlights recent datasets and benchmarks, emphasizing their robustness and other features. The paper concludes with a discussion on the challenges and future directions in table processing using LLMs and VLMs.This survey provides a comprehensive overview of table-related tasks, examining both user scenarios and technical aspects. It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis. The paper summarizes training techniques for LLMs and VLMs tailored for table processing, discusses prompt engineering, particularly the use of LLM-powered agents, and highlights challenges such as processing implicit user intentions and extracting information from various table sources. Tables are essential in daily activities like database queries, spreadsheet manipulations, web table question answering, and image table information extraction. Automating these tasks with LLMs or VLMs offers significant public benefits. The paper discusses the unique challenges of table processing, including structured data, complex reasoning, and the need to integrate external tools. It categorizes methods based on the latest paradigms in LLM usage, focusing on instruction-tuning and LLM-powered agent approaches. The paper outlines four types of tables: spreadsheet, web table, database, and document. It discusses the differences between tables and text, highlighting the two-dimensional structure of tables and their reliance on schemas. Table tasks include table QA, fact verification, data cleaning, and data analysis. The paper also covers the data lifecycle, including data entry, cleaning, CRUD operations, analysis, and visualization. The paper discusses table data representation, including text and visual representations. It explores training techniques for LLMs and VLMs, including pre-LLM era methods, instruction tuning, code tuning, and hybrid approaches. It also discusses prompting strategies for LLMs, including the use of LLM-powered agents, and highlights challenges such as cost, accuracy, and privacy issues. The paper summarizes open-source datasets, benchmarks, and software, which can facilitate the community's progress. It highlights recent datasets and benchmarks, emphasizing their robustness and other features. The paper concludes with a discussion on the challenges and future directions in table processing using LLMs and VLMs.
Reach us at info@study.space
[slides] Large Language Model for Table Processing%3A A Survey | StudySpace