Skip to content

This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners.

Notifications You must be signed in to change notification settings

HKUSTDial/NL2SQL_Handbook

Repository files navigation

Text-to-SQL Handbook

NL2SQL Handbook

This is the official repository for [TKDE'25] A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? and [VLDB'24] The Dawn of Natural Language to SQL: Are We Fully Ready?. From this repository, you can explore the latest advancements in Text-to-SQL research (a.k.a NL2SQL). We provide a comprehensive survey, in-depth research papers, and benchmark evaluations.

Image A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? Image Image

Image Natural Language to SQL: State of the Art and Open Problems Image Image

Image The Dawn of Natural Language to SQL: Are We Fully Ready? Image Image Image

📧If we missed any interesting work, connect with us.

Image

@article{liu2025survey,
  title={A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?},
  author={Liu, Xinyu and Shen, Shuyu and Li, Boyan and Ma, Peixian and Jiang, Runzhi and Zhang, Yuxin and Fan, Ju and Li, Guoliang and Tang, Nan and Luo, Yuyu},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2025},
  publisher={IEEE}
}

🧭 Text-to-SQL Introduction

Translating users' natural language queries (NL) into SQL queries can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly improved with the emergence of language models (LMs). In this context, it is crucial to assess our current position, determine the Text-to-SQL solutions that should be adopted for specific scenarios by practitioners, and identify the research topics that researchers should explore next.

Image

📈 Text-to-SQL Lifecycle

Image

  • Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances;

  • Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks;

  • Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities;

  • Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve.

🤔 Where Are We?

We categorize the challenges of Text-to-SQL into five levels, each addressing specific hurdles. The first three levels cover challenges that have been or are currently being addressed, reflecting the progressive development of Text-to-SQL. The fourth level represents the challenges we aim to tackle in the LLMs stage, while the fifth level outlines our vision for Text-to-SQL system in the next five years.

We describe the evolution of Text-to-SQL solutions from the perspective of language models, categorizing it into four stages. For each stage of Text-to-SQL, we analyze the changes in target users and the extent to which challenges are addressed.

Image

🧩 Module-based Text-to-SQL Methods

We summarize the key modules of Text-to-SQL solutions utilizing the language model.

  • Pre-processing serves as an enhancement to the model’s inputs in the Text-to-SQL parsing process. You can get more details from this chapter: Pre-Processing
  • Text-to-SQL translation methods constitute the core of the Text-to-SQL solution, responsible for converting input natural language queries into SQL queries. You can get more details from this chapter: Text-to-SQL Translation Methods
  • Post-processing is a crucial step to refine the generated SQL queries, ensuring they meet user expectations more accurately. You can get more details from this chapter: Post-Processing

Image

📚 Text-to-SQL Survey & Tutorial

  1. A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? Image Image Image
  2. Natural Language to SQL: State of the Art and Open Problems. Image Image
  3. Next-generation database interfaces: A survey of LLM-based Text-to-SQL.Image Image
  4. A Survey on Employing Large Language Models for Text-to-SQL Tasks. Image Image
  5. Large Language Model Enhanced Text-to-SQL Generation: A Survey. Image Image
  6. From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems. Image Image
  7. Natural language interfaces for tabular data querying and visualization: A survey. Image Image
  8. Natural Language Interfaces for Databases with Deep Learning.Image Image
  9. A survey on deep learning approaches for text-to-SQL. Image Image
  10. A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. Image Image
  11. Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect. Image Image
  12. A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems. Image Image
  13. State of the Art and Open Challenges in Natural Language Interfaces to Data. Image Image
  14. Natural language to SQL: Where are we today? Image Image

📰 Text-to-SQL Paper List

  1. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search Image Image Image
  2. NL2SQL-BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation.Image Image Image
  3. EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing. Image Image Image
  4. The Dawn of Natural Language to SQL: Are We Fully Ready? Image Image Image
  5. DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework. Image Image
  6. Memo-SQL: Structured Decomposition and Experience-Driven Self-Correction for Training-Free NL2SQL.Image Image
  7. Structure-Guided Large Language Models for Text-to-SQL Generation. Image Image
  8. Sphinteract: Resolving Ambiguities in NL2SQL Through User Interaction. Image Image Image
  9. OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale. Image Image Image
  10. EVOSCHEMA: TOWARDS TEXT-TO-SQL ROBUSTNESS AGAINST SCHEMA EVOLUTION. Image Image Image
  11. Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL. Image Image Image
  12. The Power of Constraints in Natural Language to SQL Translation. Image Image Image
  13. OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment. Image Image Image
  14. Reliable Text-to-SQL with Adaptive Abstention.Image Image
  15. SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference.Image Image
  16. Automated Validating and Fixing of Text-to-SQL Translation with Execution Consistency. Image Image
  17. Grounding Natural Language to SQL Translation with Data-Based Self-Explanations.Image Image Image
  18. AID-SQL: Adaptive In-Context Learning of Text-to-SQL with Difficulty-Aware Instruction and Retrieval-Augmented Generation. Image Image
  19. CLEAR: A Parser-Independent Disambiguation Framework for NL2SQL. Image Image
  20. CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL. Image Image
  21. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows. Image Image Image
  22. ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL. Image Image
  23. SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL.Image Image
  24. DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link Graph. Image Image
  25. Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL.Image Image
  26. STaR-SQL: Self-Taught Reasoner for Text-to-SQL. Image Image
  27. SQLGenie: A Practical LLM based System for Reliable and Efficient SQL Generation Image Image
  28. SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning. Image Image
  29. Confidence Estimation for Error Detection in Text-to-SQL Systems. Image Image
  30. SQLord: A Robust Enterprise Text-to-SQL Solution via Reverse Data Generation and Workflow Decomposition. Image Image
  31. DBCopilot: Scaling Natural Language Querying to Massive Databases.Image Image Image
  32. Utilising Large Language Models for Adversarial Attacks in Text-to-SQL: A Perpetrator and Victim Approach. Image Image Image
  33. You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL. Image Image Image
  34. Boosting Text-to-SQL through Multi-grained Error Identification. Image Image
  35. Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema. Image Image
  36. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. Image Image Image
  37. PARSQL: Enhancing Text-to-SQL through SQL Parsing and Reasoning. Image Image
  38. UCS-SQL: Uniting Content and Structure for Enhanced Semantic Bridging In Text-to-SQL. Image Image
  39. SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs. Image Image
  40. Optimizing Reasoning for Text-to-SQL with Execution Feedback. Image Image
  41. Knowledge Base Construction for Knowledge-Augmented Text-to-SQL. Image Image
  42. SQLong: Enhanced NL2SQL for Longer Contexts with LLMs. Image Image
  43. Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL. Image Image
  44. ImageAgentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling. Image Image.
  45. ImageAutomatic Metadata Extraction for Text-to-SQL. Image Image
  46. DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework. Image Image
  47. Rethinking Text-to-SOL: Dynamic Multi-turn SOIInteraction for Real-world Database Exploration. Image Image Image
  48. MARS-SQL: A MULTI-AGENT REINFORCEMENT LEARNING FRAMEWORK FOR TEXT-TO-SQL. Image Image Image
  49. RUBIKSQL: Lifelong Learning Agentic Knowledge Base as an Industrial NL2SQL System. Image Image
  50. CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning. Image Image Image
  51. Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning. Image Image Image
  52. SLM-SQL: An Exploration of Small Language Models for Text-to-SQL. Image Image Image
  53. Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards. Image Image Image
  54. Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL. Image Image Image
  55. Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning. Image Image Image
  56. SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs. Image Image
  57. Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL. Image Image
  58. Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs. Image Image
  59. OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale. Image Image Image
  60. SQL-Factory: A Multi-Agent Framework for High-Quality and Large-Scale SQL Generation. Image Image
  61. Text2SQL is Not Enough: Unifying AI and Databases with TAG. Image Image Image
  62. Automatic database description generation for Text-to-SQL. Image Image Image
  63. MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search. Image Image
  64. SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL. Image Image
  65. FEATHER-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models. Image Image
  66. FI-NL2PY2SQL: Financial Industry NL2SQL Innovation Model Based on Python and Large Language Model. Image Image
  67. FGCSQL: A Three-Stage Pipeline for Large Language Model-Driven Chinese Text-to-SQL. Image Image
  68. Transforming Medical Data Access: The Role and Challenges of Recent Language Models in SQL Query Automation. Image Image
  69. The Dawn of Natural Language to SQL: Are We Fully Ready? Image Image Image
  70. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. Image Image Image
  71. Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation. Image Image Image
  72. Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models. Image Image Image
  73. ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems.Image Image Image
  74. CodeS: Towards Building Open-source Language Models for Text-to-SQL. Image Image Image
  75. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. Image Image Image
  76. PURPLE: Making a Large Language Model a Better SQL Writer. Image Image Image
  77. METASQL: A Generate-then-Rank Framework for Natural Language to SQL Translation. Image Image Image
  78. Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning. Image Image Image
  79. Synthesizing Text-to-SQL Data from Weak and Strong LLMs. Image Image Image
  80. Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark. Image Image Image
  81. I Need Help! Evaluating LLM’s Ability to Ask for Users’ Support: A Case Study on Text-to-SQL Generation. Image Image Image
  82. PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL. Image Image Image
  83. Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning. Image Image
  84. Data-Centric Text-to-SQL with Large Language Models. Image Image
  85. Research and Practice on Database Interaction Based on Natural Language Processing Image Image
  86. XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL. Image Image
  87. Structure Guided Large Language Model for SQL Generation. Image Image
  88. A Plug-and-Play Natural Language Rewriter for Natural Language to SQL. Image Image
  89. RSL-SQL: Robust Schema Linking in Text-to-SQL Generation.
    Image Image Image
  90. In-Context Reinforcement Learning based Retrieval-Augmented Generation for Text-to-SQL. Image Image
  91. TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring. Image Image Image
  92. LAIA-SQL: Enhancing Natural Language to SQL Generation in Multi-Table QA via Task Decomposition and Keyword Extraction Image Image
  93. Research on Large Model Text-to-SQL Optimization Method for Intelligent Interaction in the Field of Construction Safety. Image Image
  94. SQLh-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging.Image Image
  95. Grounding Natural Language to SQL Translation with Data-Based Self-Explanations. Image Image Image
  96. Towards Optimizing SQL Generation via LLM Routing. Image Image
  97. E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL. Image Image Image
  98. DB-GPT: Empowering Database Interactions with Private Large Language Models. Image Image Image
  99. The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models. Image Image
  100. CHESS: Contextual Harnessing for Efficient SQL Synthesis. Image Image Image
  101. PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency. Image Image Image
  102. CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions. Image Image Image
  103. AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries. Image Image Image
  104. Text-to-SQL Calibration: No Need to Ask—Just Rescale Model Probabilities. Image Image
  105. Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. Image Image Image
  106. CatSQL: Towards Real World Natural Language to SQL Applications. Image Image Image
  107. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. Image Image Image
  108. Data Ambiguity Strikes Back: How Documentation Improves GPT's Text-to-SQL. Image Image
  109. ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. Image Image Image
  110. Selective Demonstrations for Cross-domain Text-to-SQL. Image Image Image
  111. RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL. Image Image Image
  112. Graphix-T5: Mixing Pre-trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing. Image Image Image
  113. Improving Generalization in Language Model-based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-based Techniques. Image Image Image
  114. G3R: A Graph-Guided Generate-and-Rerank Framework for Complex and Cross-domain Text-to-SQL Generation. Image Image
  115. Importance of Synthesizing High-quality Data for Text-to-SQL Parsing. Image Image
  116. Know What I don’t Know: Handling Ambiguous and Unknown Questions for Text-to-SQL. Image Image Image
  117. C3: Zero-shot Text-to-SQL with ChatGPT Image Image Image
  118. SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation. Image Image Image

📊 Text-to-SQL Benchmark

We create a timeline of the benchmark's development and mark relevant milestones. You can get more details from this chapter: 📊 Benchmark

Image

🎯 Where Are We Going?

  • 🎯Solve Open Text-to-SQL Problem
  • 🎯Develop Cost-effective Text-to-SQL Methods
  • 🎯Make Text-to-SQL Solutions Trustworthy
  • 🎯Text-to-SQL with Ambiguous and Unspecified NL Queries
  • 🎯Adaptive Training Data Synthesis

📖 Catalog for Our Survey

You can get more information from our subsection. We introduce representative papers on related concepts:

💾 Practical Guide for Novice

📊 How to get data:

  • We collect Text-to-SQL benchmark features and download links for you. You can get more details from this chapter: Benchmark
  • The analysis code for benchmarks is available in the src/dataset_analysis directory. Benchmark analysis reports can be found in the report/ directory.

🛠️ How to build an LLM-based Text-to-SQL model:

  • Litgpt Repository Link

    This repository offers access to over 20 high-performance large language models (LLMs) with comprehensive guides for pretraining, fine-tuning, and deploying at scale. It is designed to be beginner-friendly with from-scratch implementations and no complex abstractions.

  • LLaMA-Factory Repository Link Unified Efficient Fine-Tuning of 100+ LLMs. Integrating various models with scalable training resources, advanced algorithms, practical tricks, and comprehensive experiment monitoring tools, this setup enables efficient and faster inference through optimized APIs and UIs.

  • Fine-tuning and In-Context learning for BIRD-SQL benchmark Repository Link

    A tutorial for both Fine-tuning and In-Context Learning is provided by the BIRD-SQL benchmark.

🔎How to evaluate your model:

We collect NL2SQL evaluation metrics for you. You can get more details from this chapter: Evaluation

  • NLSQL360 Repository Link

    NL2SQL360 is a testbed for fine-grained evaluation of NL2SQL solutions. Our testbed integrates existing NL2SQL benchmarks, a repository of NL2SQL models, and various evaluation metrics, which aims to provide an intuitive and user-friendly platform to enable both standard and customized performance evaluations. Image Image Image Image

  • Test-suite-sql-eval Repository Link

    This repo contains a test suite evaluation metric for 11 text-to-SQL tasks. It is now the official metric of Spider, SParC, and CoSQL, and is also now available for Academic, ATIS, Advising, Geography, IMDB, Restaurants, Scholar, and Yelp (building on the amazing work by Catherine and Jonathan). Image Image

  • BIRD-SQL-Official Repository Link

    It is now the official tool of BIRD-SQL. It is the first tool to propose VES and give an official test suite. Image Image

🗺️ Roadmap and Decision Flow

You can get some inspiration from the Roadmap and Decision Flow.

Image

📱 Text-to-SQL Related Applications:

  • Chat2DB: AI-driven database tool and SQL client, The hottest GUI client, supporting MySQL, Oracle, PostgreSQL, DB2, SQL Server, DB2, SQLite, H2, ClickHouse, and more. Image Image
  • DB-GPT: AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents. Image
  • Postgres.new: In-browser Postgres sandbox with AI assistance. Image Image
  • QueryGPT – Natural Language to SQL Using Generative AI. Image

📮Connect with Us

Please feel free to contact us if we missed any interesting work.

📧 xliu371[at]connect.hkust-gz.edu.cn

About

This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages