TABLELLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

TABLELLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

1 Apr 2024 | Xiaokang Zhang¹, Jing Zhang¹, Zeyao Ma¹, Yang Li¹, Bohan Zhang¹, Guanlin Li¹, Zijun Yao², Kangli Xu, Jinchang Zhou², Daniel Zhang-Li², Jifan Yu², Shu Zhao³, Juanzi Li², Jie Tang²
TABLELLM is a large language model (LLM) with 13 billion parameters designed for efficient tabular data manipulation in real-world office scenarios. It is trained using a distant supervision method that includes reasoning process extension and cross-way validation to ensure data quality. The model is evaluated on a benchmark for both document and spreadsheet formats, demonstrating superior performance compared to existing general-purpose and tabular data-focused LLMs. TABLELLM is publicly available, including the model checkpoint, source code, benchmarks, and a web application for user interaction. The model addresses the challenges of real-world office data manipulation, such as diverse operations and unique processing approaches for different formats. It handles document-embedded tabular data through an inner-parameter-driven approach and spreadsheet-embedded data via a code-driven method. TABLELLM is trained using a combination of document-embedded and spreadsheet-embedded data, with a 1:1 ratio, and is fine-tuned on CodeLlama (13B) to enhance its capabilities. The model's performance is evaluated across various operations, including query, update, merge, and chart generation. It outperforms GPT-3.5 and even surpasses GPT-4 in spreadsheet-embedded scenarios. The model's effectiveness is supported by extensive user studies and benchmark evaluations, highlighting its robust generalization ability. TABLELLM also incorporates cross-way validation to improve data quality and ensure reliable results. The model is deployed as a web application, allowing users to interact with it for tabular data manipulation tasks. The application supports both document-embedded and spreadsheet-embedded data, with features for merging tables and generating charts. The model's performance is validated through extensive testing and user feedback, demonstrating its practical utility in real-world office scenarios.TABLELLM is a large language model (LLM) with 13 billion parameters designed for efficient tabular data manipulation in real-world office scenarios. It is trained using a distant supervision method that includes reasoning process extension and cross-way validation to ensure data quality. The model is evaluated on a benchmark for both document and spreadsheet formats, demonstrating superior performance compared to existing general-purpose and tabular data-focused LLMs. TABLELLM is publicly available, including the model checkpoint, source code, benchmarks, and a web application for user interaction. The model addresses the challenges of real-world office data manipulation, such as diverse operations and unique processing approaches for different formats. It handles document-embedded tabular data through an inner-parameter-driven approach and spreadsheet-embedded data via a code-driven method. TABLELLM is trained using a combination of document-embedded and spreadsheet-embedded data, with a 1:1 ratio, and is fine-tuned on CodeLlama (13B) to enhance its capabilities. The model's performance is evaluated across various operations, including query, update, merge, and chart generation. It outperforms GPT-3.5 and even surpasses GPT-4 in spreadsheet-embedded scenarios. The model's effectiveness is supported by extensive user studies and benchmark evaluations, highlighting its robust generalization ability. TABLELLM also incorporates cross-way validation to improve data quality and ensure reliable results. The model is deployed as a web application, allowing users to interact with it for tabular data manipulation tasks. The application supports both document-embedded and spreadsheet-embedded data, with features for merging tables and generating charts. The model's performance is validated through extensive testing and user feedback, demonstrating its practical utility in real-world office scenarios.
Reach us at info@study.space