Skip to content

feat: new scoresql cmd#3612

Merged
jqnatividad merged 8 commits into
masterfrom
scoresql-cmd
Mar 14, 2026
Merged

feat: new scoresql cmd#3612
jqnatividad merged 8 commits into
masterfrom
scoresql-cmd

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

No description provided.

jqnatividad and others added 4 commits March 13, 2026 16:02
…execution

Analyzes SQL queries against stats, moarstats, and frequency caches of input CSV files
to produce a performance score (0-100) with actionable optimization suggestions.
Supports both Polars (default) and DuckDB query plan analysis.

Scoring covers type optimization, join cardinality, filter selectivity, data distribution,
and query anti-pattern detection (SELECT *, ORDER BY without LIMIT, cartesian joins, etc.).
Caches are auto-generated when missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ustness

- Fix test feature gate to require both `polars` and `feature_capable`
  (matching main.rs registration), preventing test failures on polars-only builds
- Remove unused SqlInfo fields (_has_group_by, _referenced_tables,
  _select_columns) and the extract_select_columns function
- Improve subquery detection: check for SELECT inside parentheses instead of
  counting all SELECT occurrences, avoiding false positives from string literals
- Fix DuckDB plan table name substitution: sort replacements longest-first
  to prevent partial matches (e.g., "data" matching inside "data2")
- Surface user-visible warnings (wwarn!) when cache generation fails,
  not just log::warn, so users know scoring may be inaccurate

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Track single-quote state in subquery detector to avoid false positives
  from SELECT appearing inside string literals (e.g. WHERE col = '(SELECT ...')
- Add word-boundary check after SELECT to prevent matching identifiers
  like SELECTIVITY
- Deduplicate table-name replacements to avoid redundant substitutions
  when a table name happens to equal an alias

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…handling

Add a preceding word-boundary check so identifiers like PRESELECT are
not falsely detected as subqueries. Also add a comment explaining why
SQL '' escaped quotes work correctly via toggle symmetry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new scoresql command to analyze a SQL query against CSV-derived caches (stats/frequency/index) and emit a performance score plus optimization suggestions, aligning with qsv’s “smart command” approach to pre-flight analysis.

Changes:

  • Adds a new scoresql CLI command (help text, command enum wiring, module export) gated behind polars + feature_capable.
  • Implements src/cmd/scoresql.rs to generate query plans (Polars/DuckDB), load or generate caches, compute a score breakdown, and output human-readable or JSON reports.
  • Adds an integration test suite for scoresql and wires it into the test harness under the appropriate feature gates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/tests.rs Registers the new test_scoresql module under feature gating.
tests/test_scoresql.rs Adds integration tests validating text/JSON output, suggestions, joins, LIMIT behavior, and output file support.
src/main.rs Exposes scoresql in the command list, command enum, and dispatcher.
src/cmd/mod.rs Exports the new scoresql module under polars + feature gating.
src/cmd/scoresql.rs Implements the scoring command: cache handling, plan generation, heuristic SQL parsing, scoring, and formatting.

Comment thread src/cmd/scoresql.rs Outdated
Comment thread src/cmd/scoresql.rs Outdated
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs
…SQL parsing, regex

* Remove incorrect first-line skip in load_stats_cache (stats JSONL has no metadata header)
* Sort alias replacements by length descending to prevent partial matches (_t_1 inside _t_10)
* Add split_on_operators() to correctly parse column names from `col=value` patterns
* Use word-boundary regex in get_duckdb_plan instead of naive String::replace

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new feature-gated qsv scoresql command that analyzes a SQL query against CSV-derived caches to produce a performance score, query plan, and optimization suggestions before execution.

Changes:

  • Introduces scoresql command implementation (Polars plan analysis by default, optional DuckDB plan mode).
  • Registers the new command in the CLI command enum/help output under polars + feature_capable.
  • Adds an integration test suite covering JSON/human output structure and several scoring heuristics.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/tests.rs Wires test_scoresql module into the integration test harness behind feature flags.
tests/test_scoresql.rs New integration tests for scoresql output structure and key suggestion heuristics.
src/main.rs Registers scoresql as an enabled command and routes execution to cmd::scoresql::run.
src/cmd/scoresql.rs Implements query parsing heuristics, cache loading/generation, plan extraction, scoring, and output formatting.
src/cmd/mod.rs Exposes the scoresql module behind feature gates.

Comment thread src/cmd/scoresql.rs Outdated
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs Outdated
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs
Comment thread src/cmd/scoresql.rs Outdated
Comment thread src/cmd/scoresql.rs
jqnatividad and others added 3 commits March 13, 2026 20:58
…Polars plan

- Use labeled loop (`'on_clause`) in `extract_join_columns` so that
  stop-keywords (WHERE, ORDER, etc.) break the outer token loop, not
  just the inner operator-split loop.
- Apply word-boundary regex replacement in `get_polars_plan` (matching
  the existing `get_duckdb_plan` strategy) to prevent alias partial
  matches (e.g., alias "data" inside "metadata_col").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, cache freshness, dedup

* Use >= in is_cache_fresh to avoid churn on coarse-timestamp filesystems
* Deduplicate extracted join columns to prevent double-counting
* Pass --delimiter to qsv stats/frequency when generating caches
* Use canonical path for cache generation (matches cache lookup path)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ter assertions

- Use sort_unstable_by/dedup_by with case-insensitive comparison to match
  the existing comment's claim of case-insensitive deduplication
- Add debug_assert!(delim.is_ascii()) guards in stats/freq cache generation
  to document the ASCII delimiter assumption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 6d57023 into master Mar 14, 2026
16 of 17 checks passed
@jqnatividad jqnatividad deleted the scoresql-cmd branch March 14, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants