MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation

MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation

13 May 2024 | Dongjun Lee, Choongwon Park, Jaehyuk Kim, Heesoo Park
This paper proposes MCS-SQL, a novel approach that leverages multiple prompts and multiple-choice selection to enhance the accuracy and robustness of text-to-SQL generation using in-context learning (ICL). The method consists of three main steps: schema linking, multiple SQL generation, and selection. In schema linking, multiple prompts are used to robustly identify relevant tables and columns from the database schema. In the multiple SQL generation step, various prompts are used to generate diverse candidate SQL queries. Finally, the selection step filters candidate queries based on confidence scores and selects the most accurate query through multiple-choice selection presented to the LLM. The proposed method achieves significant improvements on the BIRD and Spider benchmarks. On BIRD, it achieves an execution accuracy (EX) of 65.5% and a valid efficiency score (VES) of 71.4%, outperforming previous ICL-based methods by 5.9% and 3.7%, respectively. On Spider, it achieves an EX of 89.6%, surpassing the existing SOTA ICL-based approach by 3.0%. The method also demonstrates improved performance on various difficulty levels of the BIRD and Spider datasets. The study highlights the sensitivity of large language models (LLMs) to the structure and content of prompts. By leveraging multiple prompts, the proposed approach effectively explores a broader search space for possible answers and aggregates them to generate more robust SQL queries. The method also incorporates confidence-based filtering and multiple-choice selection to further improve the accuracy of the generated SQL queries. The results show that the proposed MCS-SQL approach significantly outperforms existing ICL-based methods in terms of both accuracy and efficiency. The method's effectiveness is demonstrated across various benchmarks, including BIRD and Spider, and it establishes a new state-of-the-art performance on the BIRD benchmark. The study also identifies the importance of schema linking and the impact of different few-shot selection strategies on the performance of text-to-SQL generation. The findings suggest that using multiple prompts and multiple-choice selection can significantly enhance the accuracy and robustness of text-to-SQL generation using LLMs.This paper proposes MCS-SQL, a novel approach that leverages multiple prompts and multiple-choice selection to enhance the accuracy and robustness of text-to-SQL generation using in-context learning (ICL). The method consists of three main steps: schema linking, multiple SQL generation, and selection. In schema linking, multiple prompts are used to robustly identify relevant tables and columns from the database schema. In the multiple SQL generation step, various prompts are used to generate diverse candidate SQL queries. Finally, the selection step filters candidate queries based on confidence scores and selects the most accurate query through multiple-choice selection presented to the LLM. The proposed method achieves significant improvements on the BIRD and Spider benchmarks. On BIRD, it achieves an execution accuracy (EX) of 65.5% and a valid efficiency score (VES) of 71.4%, outperforming previous ICL-based methods by 5.9% and 3.7%, respectively. On Spider, it achieves an EX of 89.6%, surpassing the existing SOTA ICL-based approach by 3.0%. The method also demonstrates improved performance on various difficulty levels of the BIRD and Spider datasets. The study highlights the sensitivity of large language models (LLMs) to the structure and content of prompts. By leveraging multiple prompts, the proposed approach effectively explores a broader search space for possible answers and aggregates them to generate more robust SQL queries. The method also incorporates confidence-based filtering and multiple-choice selection to further improve the accuracy of the generated SQL queries. The results show that the proposed MCS-SQL approach significantly outperforms existing ICL-based methods in terms of both accuracy and efficiency. The method's effectiveness is demonstrated across various benchmarks, including BIRD and Spider, and it establishes a new state-of-the-art performance on the BIRD benchmark. The study also identifies the importance of schema linking and the impact of different few-shot selection strategies on the performance of text-to-SQL generation. The findings suggest that using multiple prompts and multiple-choice selection can significantly enhance the accuracy and robustness of text-to-SQL generation using LLMs.
Reach us at info@study.space