29 Mar 2024 | Tonghui Ren, Yuankai Fan, Zhenying He, Ren Huang, Jiaqi Dai, Can Huang, Yinan Jing, Kai Zhang, Yifan Yang, X.Sean Wang
PURPLE is a novel approach that enhances the SQL generation capabilities of large language models (LLMs) for natural language to SQL (NL2SQL) translation. The method improves accuracy by retrieving demonstrations containing the necessary logical operator composition for the task, thereby guiding LLMs to produce better SQL translations. PURPLE achieves a state-of-the-art performance of 80.5% exact-set match accuracy and 87.8% execution match accuracy on the Spider benchmark. It maintains high accuracy across diverse benchmarks, budget constraints, and various LLMs, demonstrating robustness and cost-effectiveness.
The key components of PURPLE include Schema Pruning, Skeleton Prediction, Demonstration Selection, and Database Adaption. Schema Pruning reduces the database information to focus on the necessary elements for the task. Skeleton Prediction identifies the logical operator composition needed for the task. Demonstration Selection retrieves relevant examples based on the inferred skeleton. Database Adaption adjusts the output to fit the specific database schema and SQL dialect, mitigating hallucination issues.
PURPLE's performance is evaluated on four NL2SQL benchmarks: Spider, Spider-DK, SpiderSYN, and Spider-Realistic. It outperforms existing LLM-based and PLM-based approaches in terms of exact-set match, execution match, and test-suite accuracy. PURPLE achieves the highest scores in all three metrics, demonstrating its effectiveness in improving SQL generation for NL2SQL tasks. The method is flexible, allowing for trade-offs between cost and performance. It is also robust, showing consistent performance across different LLMs and benchmark settings. The results highlight the potential of PURPLE to enhance the capabilities of LLMs in NL2SQL tasks.PURPLE is a novel approach that enhances the SQL generation capabilities of large language models (LLMs) for natural language to SQL (NL2SQL) translation. The method improves accuracy by retrieving demonstrations containing the necessary logical operator composition for the task, thereby guiding LLMs to produce better SQL translations. PURPLE achieves a state-of-the-art performance of 80.5% exact-set match accuracy and 87.8% execution match accuracy on the Spider benchmark. It maintains high accuracy across diverse benchmarks, budget constraints, and various LLMs, demonstrating robustness and cost-effectiveness.
The key components of PURPLE include Schema Pruning, Skeleton Prediction, Demonstration Selection, and Database Adaption. Schema Pruning reduces the database information to focus on the necessary elements for the task. Skeleton Prediction identifies the logical operator composition needed for the task. Demonstration Selection retrieves relevant examples based on the inferred skeleton. Database Adaption adjusts the output to fit the specific database schema and SQL dialect, mitigating hallucination issues.
PURPLE's performance is evaluated on four NL2SQL benchmarks: Spider, Spider-DK, SpiderSYN, and Spider-Realistic. It outperforms existing LLM-based and PLM-based approaches in terms of exact-set match, execution match, and test-suite accuracy. PURPLE achieves the highest scores in all three metrics, demonstrating its effectiveness in improving SQL generation for NL2SQL tasks. The method is flexible, allowing for trade-offs between cost and performance. It is also robust, showing consistent performance across different LLMs and benchmark settings. The results highlight the potential of PURPLE to enhance the capabilities of LLMs in NL2SQL tasks.