13 Mar 2025 | Zijin Hong1, Zheng Yuan1, Qinggang Zhang1, Hao Chen2, Junnan Dong1, Feiran Huang3, and Xiao Huang1*
The paper provides a comprehensive survey of LLM-based text-to-SQL systems, addressing the challenges and advancements in this field. It begins by outlining the technical challenges in text-to-SQL, including linguistic complexity, schema understanding, rare and complex SQL operations, and cross-domain generalization. The evolution of text-to-SQL methods from rule-based to deep learning-based approaches and the integration of pre-trained language models (PLMs) and large language models (LLMs) is discussed. The paper introduces various benchmarks and evaluation metrics for text-to-SQL systems, such as component matching, exact matching, execution accuracy, and valid efficiency score. It then delves into the methods used in LLM-based text-to-SQL, focusing on in-context learning (ICL) and fine-tuning (FT) paradigms. ICL methods are categorized into vanilla prompting, decomposition, prompt optimization, reasoning enhancement, and execution refinement. FT methods are categorized into enhanced architecture, pre-training, and data augmentation. The paper concludes by summarizing the current state of LLM-based text-to-SQL, highlighting remaining challenges, and suggesting future research directions.The paper provides a comprehensive survey of LLM-based text-to-SQL systems, addressing the challenges and advancements in this field. It begins by outlining the technical challenges in text-to-SQL, including linguistic complexity, schema understanding, rare and complex SQL operations, and cross-domain generalization. The evolution of text-to-SQL methods from rule-based to deep learning-based approaches and the integration of pre-trained language models (PLMs) and large language models (LLMs) is discussed. The paper introduces various benchmarks and evaluation metrics for text-to-SQL systems, such as component matching, exact matching, execution accuracy, and valid efficiency score. It then delves into the methods used in LLM-based text-to-SQL, focusing on in-context learning (ICL) and fine-tuning (FT) paradigms. ICL methods are categorized into vanilla prompting, decomposition, prompt optimization, reasoning enhancement, and execution refinement. FT methods are categorized into enhanced architecture, pre-training, and data augmentation. The paper concludes by summarizing the current state of LLM-based text-to-SQL, highlighting remaining challenges, and suggesting future research directions.