Skip to content

feat(describegpt): scoresql integration#3624

Merged
jqnatividad merged 5 commits into
masterfrom
describegpt-scoresql-integration
Mar 17, 2026
Merged

feat(describegpt): scoresql integration#3624
jqnatividad merged 5 commits into
masterfrom
describegpt-scoresql-integration

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

No description provided.

jqnatividad and others added 4 commits March 17, 2026 13:50
…neration

Score LLM-generated SQL queries with `qsv scoresql` before execution,
iteratively asking the LLM to improve queries that fall below a quality
threshold. This produces better SQL and fewer failed executions.

New flags: --no-score-sql, --score-threshold (default 50),
--score-max-retries (default 3). Adds 8 integration tests covering
polars and DuckDB backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…placement in scoresql

- Track best SQL as a template with {INPUT_TABLE_NAME} placeholder instead of
  doing reverse replacement which corrupts SQL when file stem is a common word
- Use saturating_add for max_retries loop bound to prevent overflow
- Add explicit table name instructions to LLM refinement/error prompts
- Cap score_max_retries to 100 to prevent unreasonable values
- Add skip messages to scoresql tests for CI visibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eplacement

Replace blind `scoring_sql.replace(file_stem, INPUT_TABLE_NAME)` with a
regex that only substitutes `file_stem` after FROM/JOIN keywords, preventing
corruption of column names or literals that contain the file stem as a
substring.

Also warn when --score-max-retries is silently clamped to 100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in scoresql

- Move Regex::new() before the retry loop since file_stem is invariant
- Extend pattern to match INTO/UPDATE keywords (not just FROM/JOIN)
- Handle quoted/backtick-delimited table names in the replacement regex
- Add safety comment about INPUT_TABLE_NAME and regex replacement chars

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds scoresql-based validation/refinement to describegpt’s SQL-RAG execution path, with new CLI flags and integration tests to exercise the behavior.

Changes:

  • Introduces --no-score-sql, --score-threshold, and --score-max-retries options and wires them into the --prompt + --sql-results SQL execution flow.
  • Adds a scoring loop that runs qsv scoresql --json and optionally re-prompts the LLM to iteratively improve low-scoring SQL.
  • Adds new integration tests covering default scoring, disabling scoring, threshold/retry behavior, and DuckDB-backed scoring.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
src/cmd/describegpt.rs Adds CLI flags plus SQL scoring/refinement logic using the scoresql subcommand before SQL execution.
tests/test_describegpt.rs Adds integration tests for the new scoresql/scoring flags and retry/threshold behavior (including DuckDB cases).

Comment thread tests/test_describegpt.rs Outdated
Comment thread tests/test_describegpt.rs
Comment thread tests/test_describegpt.rs
Comment thread src/cmd/describegpt.rs
Comment thread tests/test_describegpt.rs
Comment thread tests/test_describegpt.rs
Comment thread tests/test_describegpt.rs Outdated
Comment thread src/cmd/describegpt.rs
Comment thread tests/test_describegpt.rs
- Add success assertions to all scoresql integration tests to catch
  early failures before checking stderr
- Change threshold from 100 to 101 in high-threshold tests to eliminate
  flakiness (a perfect 100/100 score is possible)
- Fix misleading "Attempt" wording to "Retry" in refinement prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds scoresql validation/iteration to describegpt so LLM-generated SQL can be scored and optionally refined before execution when --prompt is used with --sql-results.

Changes:

  • Introduces new CLI flags to control SQL scoring (--no-score-sql, --score-threshold, --score-max-retries) and wires them into the SQL-execution path.
  • Implements a scoring loop that calls the scoresql subcommand, logs score/attempts, and optionally re-prompts the LLM to improve low-scoring SQL.
  • Adds integration tests covering default scoring behavior, disabling scoring, thresholds, retry limits, and DuckDB scoring scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/cmd/describegpt.rs Adds CLI options + implements scoresql-based scoring/refinement before executing generated SQL.
tests/test_describegpt.rs Adds integration tests validating the new scoring flags and retry/threshold behavior (including DuckDB cases).

Comment thread tests/test_describegpt.rs
Comment thread tests/test_describegpt.rs
Comment thread src/cmd/describegpt.rs
@jqnatividad jqnatividad merged commit e7ed7f0 into master Mar 17, 2026
20 of 21 checks passed
@jqnatividad jqnatividad deleted the describegpt-scoresql-integration branch March 17, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants