feat(describegpt): scoresql integration#3624
Merged
Merged
Conversation
…neration Score LLM-generated SQL queries with `qsv scoresql` before execution, iteratively asking the LLM to improve queries that fall below a quality threshold. This produces better SQL and fewer failed executions. New flags: --no-score-sql, --score-threshold (default 50), --score-max-retries (default 3). Adds 8 integration tests covering polars and DuckDB backends. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…placement in scoresql
- Track best SQL as a template with {INPUT_TABLE_NAME} placeholder instead of
doing reverse replacement which corrupts SQL when file stem is a common word
- Use saturating_add for max_retries loop bound to prevent overflow
- Add explicit table name instructions to LLM refinement/error prompts
- Cap score_max_retries to 100 to prevent unreasonable values
- Add skip messages to scoresql tests for CI visibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eplacement Replace blind `scoring_sql.replace(file_stem, INPUT_TABLE_NAME)` with a regex that only substitutes `file_stem` after FROM/JOIN keywords, preventing corruption of column names or literals that contain the file stem as a substring. Also warn when --score-max-retries is silently clamped to 100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in scoresql - Move Regex::new() before the retry loop since file_stem is invariant - Extend pattern to match INTO/UPDATE keywords (not just FROM/JOIN) - Handle quoted/backtick-delimited table names in the replacement regex - Add safety comment about INPUT_TABLE_NAME and regex replacement chars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds scoresql-based validation/refinement to describegpt’s SQL-RAG execution path, with new CLI flags and integration tests to exercise the behavior.
Changes:
- Introduces
--no-score-sql,--score-threshold, and--score-max-retriesoptions and wires them into the--prompt + --sql-resultsSQL execution flow. - Adds a scoring loop that runs
qsv scoresql --jsonand optionally re-prompts the LLM to iteratively improve low-scoring SQL. - Adds new integration tests covering default scoring, disabling scoring, threshold/retry behavior, and DuckDB-backed scoring.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
src/cmd/describegpt.rs |
Adds CLI flags plus SQL scoring/refinement logic using the scoresql subcommand before SQL execution. |
tests/test_describegpt.rs |
Adds integration tests for the new scoresql/scoring flags and retry/threshold behavior (including DuckDB cases). |
- Add success assertions to all scoresql integration tests to catch early failures before checking stderr - Change threshold from 100 to 101 in high-threshold tests to eliminate flakiness (a perfect 100/100 score is possible) - Fix misleading "Attempt" wording to "Retry" in refinement prompt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds scoresql validation/iteration to describegpt so LLM-generated SQL can be scored and optionally refined before execution when --prompt is used with --sql-results.
Changes:
- Introduces new CLI flags to control SQL scoring (
--no-score-sql,--score-threshold,--score-max-retries) and wires them into the SQL-execution path. - Implements a scoring loop that calls the
scoresqlsubcommand, logs score/attempts, and optionally re-prompts the LLM to improve low-scoring SQL. - Adds integration tests covering default scoring behavior, disabling scoring, thresholds, retry limits, and DuckDB scoring scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/cmd/describegpt.rs |
Adds CLI options + implements scoresql-based scoring/refinement before executing generated SQL. |
tests/test_describegpt.rs |
Adds integration tests validating the new scoring flags and retry/threshold behavior (including DuckDB cases). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.