This new GitHub repository offers a benchmark for Small Language Models (LLMs) in SQL generation. It features a well-defined semantic model, comprising verified SQL queries and sample values, which enables the generation of accurate SQL from natural language questions for evaluation.
Initial benchmarks were performed using Qwen3:30B-a3b local, the remote o3-mini GPT-4o (hosted in Azure) is used for validation
I am personally interested in the performance of small Language models