Understanding TabSQLify%3A Enhancing Reasoning Capabilities of LLMs Through Table Decomposition

**TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition** **Authors:** Md Mahadi Hasan Nahid, Davood Rafiei **Institution:** University of Alberta **Emails:** mnahid@ualberta.ca, drafiei@ualberta.ca **Abstract:** Table reasoning is a challenging task that requires understanding both natural language questions and structured tabular data. Large language models (LLMs) excel in natural language understanding and generation but struggle with large tables due to their limited input length. This paper introduces TabSQLify, a novel method that leverages text-to-SQL generation to decompose tables into smaller and relevant sub-tables, containing only essential information for answering questions or verifying statements, before performing the reasoning task. Comprehensive evaluations on four datasets show that TabSQLify achieves comparable or superior performance compared to prevailing methods that rely on full tables as input. Additionally, TabSQLify significantly reduces the input context length, making it more scalable and efficient for large-scale table reasoning applications. On the WikiTQ benchmark, it achieves an accuracy of 64.7%, and on the TabFact benchmark, it achieves 79.5%, surpassing other LLM-based baseline models on gpt-3.5-turbo (chatgpt). **Contributions:** 1. We present a novel approach that utilizes text-to-SQL generation to decompose tables into smaller, contextually relevant sub-tables, particularly designed for table reasoning tasks. 2. Our model outperforms some of the leading models that employ multiple responses and self-consistency techniques. 3. Our evaluation on challenging table reasoning datasets demonstrates the remarkable performance of our method compared to existing methods that rely on full tables as input. **Methodology:** TabSQLify integrates symbolic methods with the reasoning power of LLMs. It involves two key steps: 1. Generating SQL queries from natural language questions or statements using LLMs under few-shot prompting, then executing these queries on the original tables to obtain sub-tables containing only essential information. 2. Using LLMs with the sub-table and the question or claim to generate the answer. **Evaluation:** TabSQLify is evaluated on four datasets: WikiTQ, FeTaQA, TabFact, and WikiSQL. The results show that TabSQLify outperforms various baseline models, including pre-training and fine-tuning-based models, and LLM-based models. It achieves high accuracy on challenging tasks and demonstrates scalability and robustness under token limit constraints. **Conclusion:** TabSQLify provides a novel approach to table reasoning by decomposing tables into smaller sub-tables, enhancing the reasoning capabilities of LLMs. This method offers a new perspective and direction for table reasoning research, aiming to inspire more future work on combining natural language understanding and structured data processing.**TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition** **Authors:** Md Mahadi Hasan Nahid, Davood Rafiei **Institution:** University of Alberta **Emails:** mnahid@ualberta.ca, drafiei@ualberta.ca **Abstract:** Table reasoning is a challenging task that requires understanding both natural language questions and structured tabular data. Large language models (LLMs) excel in natural language understanding and generation but struggle with large tables due to their limited input length. This paper introduces TabSQLify, a novel method that leverages text-to-SQL generation to decompose tables into smaller and relevant sub-tables, containing only essential information for answering questions or verifying statements, before performing the reasoning task. Comprehensive evaluations on four datasets show that TabSQLify achieves comparable or superior performance compared to prevailing methods that rely on full tables as input. Additionally, TabSQLify significantly reduces the input context length, making it more scalable and efficient for large-scale table reasoning applications. On the WikiTQ benchmark, it achieves an accuracy of 64.7%, and on the TabFact benchmark, it achieves 79.5%, surpassing other LLM-based baseline models on gpt-3.5-turbo (chatgpt). **Contributions:** 1. We present a novel approach that utilizes text-to-SQL generation to decompose tables into smaller, contextually relevant sub-tables, particularly designed for table reasoning tasks. 2. Our model outperforms some of the leading models that employ multiple responses and self-consistency techniques. 3. Our evaluation on challenging table reasoning datasets demonstrates the remarkable performance of our method compared to existing methods that rely on full tables as input. **Methodology:** TabSQLify integrates symbolic methods with the reasoning power of LLMs. It involves two key steps: 1. Generating SQL queries from natural language questions or statements using LLMs under few-shot prompting, then executing these queries on the original tables to obtain sub-tables containing only essential information. 2. Using LLMs with the sub-table and the question or claim to generate the answer. **Evaluation:** TabSQLify is evaluated on four datasets: WikiTQ, FeTaQA, TabFact, and WikiSQL. The results show that TabSQLify outperforms various baseline models, including pre-training and fine-tuning-based models, and LLM-based models. It achieves high accuracy on challenging tasks and demonstrates scalability and robustness under token limit constraints. **Conclusion:** TabSQLify provides a novel approach to table reasoning by decomposing tables into smaller sub-tables, enhancing the reasoning capabilities of LLMs. This method offers a new perspective and direction for table reasoning research, aiming to inspire more future work on combining natural language understanding and structured data processing.

TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition

15 Apr 2024 | Md Mahadi Hasan Nahid, Davood Rafiei