This repository contains the official implementation of the CIKM 2025 paper "Multi-Turn Interactions for Text-to-SQL with Large Language Models".
The following figure illustrates the overall workflow.
Download the raw datasets from their original sources, and organize them under dataset/ with the following structure:
dataset
├── spider (about 1018MB)
│ └── database (ln -s ./test_database ./database)
│ ├── aan_1
│ └── ...
├── Spider-DK
│ └── database
│ ├── new_concert_singer
│ └── ...
├── Spider-Realistic
│ ├── dev.json
│ └── ...
├── Spider-Syn
│ ├── dev.json
│ └── ...
├── bird (about 45G)
│ ├── column_meaning.json
│ └── dev_databases
│ ├── california_schools
│ └── ...
├── bird-FinC
│ ├── financial_corrected.json
│ └── ...
├── bird-minidev
│ └── dev_databases
│ ├── california_schools
│ └── ...
└── spider2-sqlite
└── spider2-localdb
├── AdventureWorks.sqlite
└── ...
We provide preprocessed artifacts as tarballs under targz/ (optional).
For database preprocessing, you can either use the provided artifacts or run the scripts yourself.
To use the provided artifacts:
tar -xzf targz/cols_info.tar.gz
tar -xzf targz/dbs_info.tar.gzThis will produce database/cols_info/ and database/dbs_info/.
Alternatively, you can run preprocess/{db}_step1_cols_info.py and preprocess/{db}_step2_dbs_info.py to process them yourself.
Tested environment:
OS: Ubuntu 22.04.2 LTS
CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Memory: 128 GB
Python: 3.11.5
ElasticSearch: 8.14.1 (JDK 21.0.2)
ChromaDB: 0.5.4
Install dependencies:
pip install -r requirements.txtPlease refer to tool/client_es.py.
You need to build the indices before running inference.
export OPENAI_API_KEY=xxx
export PYTHONPATH=$(pwd):$PYTHONPATH
# spider
python preprocess/spider_step3_indexing_es.py
python preprocess/spider_step4_vectorization.py
# bird
python preprocess/bird_step3_indexing_es.py
python preprocess/bird_step4_vectorization.pyRun the following commands to verify that the tools are working correctly:
python tool/spider_search.py
python tool/spider_execution.pyRun interactive_text_to_sql.py to start an interactive session (see the script for available arguments).
e.g.
python interactive_text_to_sql.py --dataset "spider-dev" --model_name "gpt-4o-2024-05-13"
python interactive_text_to_sql.py --dataset "bird-dev" --model_name "gpt-4o-2024-05-13"We provide two ways to run evaluation:
We release the final prediction files as targz/result.tar.gz. Extract it in the repo root (it contains a top-level result/ directory):
tar -xzf targz/result.tar.gzMake sure the datasets are downloaded and placed under dataset/ as described above, then run the evaluation scripts:
# Spider (EM/EX + TS where applicable)
bash script/eval_all_spider.sh
# BIRD (ours)
bash script/eval_all_bird_no_evi.sh
bash script/eval_all_bird_with_evi.shIf you ran interactive inference yourself, you should have a directory like save-crossdb-infer-dialog/... that stores the full multi-turn interaction logs. You can convert those logs into the result/ format expected by the evaluation scripts:
python make_final_res.pyWe also release the full interaction logs as targz/save-crossdb-infer-dialog-clean.tar.gz (optional):
tar -xzf targz/save-crossdb-infer-dialog-clean.tar.gzmake_final_res.py expects the logs under save-crossdb-infer-dialog/ by default. If you extract the released logs, rename the directory accordingly (or adjust the paths in make_final_res.py):
mv save-crossdb-infer-dialog-clean save-crossdb-infer-dialogNote: make_final_res.py currently expects the model directory name gpt-4o-2024-05-13 under each run folder. If your logs are saved under a different model name (e.g., gpt-4o), either rename that directory or update the paths in make_final_res.py.
After result/ is generated, run the same scripts in script/ as in Option A.
-
Spider Test Suite (TS) evaluation:
script/eval_all_spider.shusesevaluation/test_suite_sql_eval/testsuite_databaseevaluation/test_suite_sql_eval/testsuite_tables.json
These are not included in this repo. Please obtain them from the official Test Suite repo and copy/symlink them into the paths above (the upstream repo is
https://github.com/taoyds/test-suite-sql-eval). For example:
git clone https://github.com/taoyds/test-suite-sql-eval.git /tmp/test-suite-sql-eval
ln -s /tmp/test-suite-sql-eval/database evaluation/test_suite_sql_eval/testsuite_database
cp /tmp/test-suite-sql-eval/tables.json evaluation/test_suite_sql_eval/testsuite_tables.json- BIRD-FinC evaluation:
make_final_res.pyexpects the BIRD-FinC JSON files fromthe-effects-of-noise-in-text-to-SQLunder the parent directory:
cd ..
git clone https://github.com/niklaswretblad/the-effects-of-noise-in-text-to-SQL.git
cd Interactive-Text-to-SQL-publish@inproceedings{10.1145/3746252.3761052,
author = {Xiong, Guanming and Bao, Junwei and Jiang, Hongfei and Song, Yang and Zhao, Wen},
title = {Multi-Turn Interactions for Text-to-SQL with Large Language Models},
year = {2025},
isbn = {9798400720406},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746252.3761052},
doi = {10.1145/3746252.3761052},
abstract = {This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods are still inefficient and struggle to handle cases with wide tables effectively. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail to provide a universally applicable interaction design. To address these challenges, we introduce Interactive-T2S, a framework that generates SQL queries through direct interactions with databases. This framework includes four general tools that facilitate proactive and efficient information retrieval by the LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise reasoning processes within our framework. Our approach achieves advanced performance on the Spider and BIRD datasets as well as their variants. Notably, we obtain state-of-the-art results on the BIRD leaderboard under the setting without oracle knowledge, demonstrating the effectiveness of our method. Code and data are available at: https://github.com/JimXiongGM/Interactive-Text-to-SQL.},
booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
pages = {3560–3570},
numpages = {11},
keywords = {large language model, low-resource, text-to-sql},
location = {Seoul, Republic of Korea},
series = {CIKM '25}
}