13 May 2024 | Dongjun Lee, Choongwon Park, Jaehyuk Kim, Heesoo Park
This paper introduces a novel approach, MCS-SQL, to enhance the accuracy and robustness of in-context learning (ICL) for text-to-SQL generation. The proposed method leverages multiple prompts to explore a broader search space for possible answers and effectively aggregates them. Specifically, the approach includes three main steps: schema linking, multiple SQL generation, and selection. During schema linking, the system robustly selects relevant tables and columns from the database schema using multiple prompts. Subsequently, various candidate SQL queries are generated based on diverse prompts and refined schemas. Finally, the candidate queries are filtered based on confidence scores, and the optimal query is selected through a multiple-choice selection process presented to the LLM. Evaluations on the BIRD and Spider benchmarks show that MCS-SQL achieves significantly higher execution accuracies (65.5% and 89.6%, respectively) compared to previous ICL-based methods, establishing a new state-of-the-art performance on the BIRD benchmark. The study also highlights the importance of schema linking and the impact of prompt engineering on the accuracy of LLMs in text-to-SQL tasks.This paper introduces a novel approach, MCS-SQL, to enhance the accuracy and robustness of in-context learning (ICL) for text-to-SQL generation. The proposed method leverages multiple prompts to explore a broader search space for possible answers and effectively aggregates them. Specifically, the approach includes three main steps: schema linking, multiple SQL generation, and selection. During schema linking, the system robustly selects relevant tables and columns from the database schema using multiple prompts. Subsequently, various candidate SQL queries are generated based on diverse prompts and refined schemas. Finally, the candidate queries are filtered based on confidence scores, and the optimal query is selected through a multiple-choice selection process presented to the LLM. Evaluations on the BIRD and Spider benchmarks show that MCS-SQL achieves significantly higher execution accuracies (65.5% and 89.6%, respectively) compared to previous ICL-based methods, establishing a new state-of-the-art performance on the BIRD benchmark. The study also highlights the importance of schema linking and the impact of prompt engineering on the accuracy of LLMs in text-to-SQL tasks.