describegpt: split run_inference_options into per-phase helpers#3722
Merged
Conversation
…pers Decompose the ~875-line run_inference_options into eight focused helpers while preserving all behavior: - build_inference_messages (was nested get_messages closure) promoted to file-scope free fn. - run_dictionary_phase - run_description_phase - run_tags_phase - run_prompt_phase (returns a PromptPhaseOutcome struct carrying the completion, has_sql_query flag, mutated session state, and the system_prompt that execute_sql_query_phase reuses for refinement calls) - execute_sql_query_phase — extracts the ~400-line SQL-results block (file validation, fence extraction, scoresql retry loop, DuckDB/Polars dispatch, session tracking) - finalize_structured_output — JSON/TOON emission at end of run run_inference_options is now a thin orchestrator: create client, check model, dispatch phases in order, apply max-tokens gate, route SQL, finalize output, persist session. Minor consolidation along the way: get_prompt_file(args)? is called once at the orchestrator (for base_url) instead of 4 times across the phases; get_output_format(args)? is called once and passed through instead of re-computed per phase. Error-ordering is preserved within the phases; the only visible timing change is that a bad prompt-file now errors before the first LLM call rather than after the dictionary phase — arguably better UX (no tokens wasted on failure). 64 integration + 13 unit tests pass, clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
Three findings from the review of 514cacc: 1. **Medium — max-tokens regression for dictionary-only runs.** The original code left `completion_response` as `CompletionResponse::default()` when only the dictionary ran, so `--dictionary --max-tokens=N` with dictionary tokens ≥ N never tripped the gate. My `last_completion = data_dict.clone()` after the dictionary phase broke that invariant. Dropped the assignment and added a comment documenting why — the description/tags/prompt branches still overwrite `last_completion` correctly, and dictionary-only runs pass through as before. 2. **Low — 4 redundant String allocations per error path.** `execute_sql_query_phase` was calling `normalized_session_path.map(String::from).as_ref()` five times in error branches, each allocating a fresh String just to get `Option<&String>`. Bind once at the top of the function and pass `session_path_owned.as_ref()` everywhere. 3. **Low — session_len sentinel documentation.** `flag_session_len == 0` collapses "user did not set it" and "user set 0" into the same default (10). Preserved from pre-refactor behavior but worth a comment; added one in `run_prompt_phase`. 13 unit + 64 integration tests pass, clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors describegpt’s large run_inference_options function into a thin orchestrator that delegates to focused per-phase helpers, while consolidating repeated prompt-file/output-format lookups.
Changes:
- Promotes message-construction into
build_inference_messagesand splits inference into per-phase helpers (run_dictionary_phase,run_description_phase,run_tags_phase,run_prompt_phase). - Extracts the SQL execution/refinement path into
execute_sql_query_phaseand addsPromptPhaseOutcometo carry prompt-phase results forward. - Centralizes structured output emission via
finalize_structured_outputand computesbase_url/output_formatonce at the orchestrator.
Five findings, four of them variants of the same issue:
Four cache-invalidation sites in execute_sql_query_phase skipped
invalidation when cache_type == Fresh. But CacheType::Fresh still
writes the completion into the underlying Disk/Redis cache (see the
second get_*_completion call inside get_cached_completion at ~L2102,
comment on CacheType enum confirms: "Forces fresh API call but still
updates cache"). So a --fresh run that produced unusable SQL (failed
fence extraction, DuckDB execution error, Polars error) left the bad
Prompt response in the cache, to be reused by the next non-fresh run.
Changed the guard at all five sites from
cache_type != &Fresh && cache_type != &None
to
cache_type != &None
so invalidation fires whenever caching is active. This is a
pre-existing bug that became more visible after the phase split —
bundled here since it's a one-line correctness fix at call sites
the reviewer pointed to. None of this PR's structural changes affect
when or whether these invalidations fire.
Also taking a perf micro-fix from the review in build_inference_messages:
replaced `json!(messages)` (which re-serializes through the json! macro
even though messages is already Vec<serde_json::Value>) with
serde_json::Value::Array(messages), which is a direct constructor.
13 unit + 64 integration tests pass, clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new findings from the latest Copilot review:
1. **dictionary_cache_type respects --no-cache.** In run_dictionary_phase,
`--prompt` + `--fresh` was forcing Disk/Redis for the dictionary
phase even when cache_type is None (user passed --no-cache). That
silently re-enabled caching. Now gated on
`cache_type != &CacheType::None` so --no-cache remains authoritative.
Pre-existing behavior.
2. **Polars .csv sibling writability probe.** execute_sql_query_phase
validated only the primary sql_results path, but the Polars path
renames to `{sql_results}.csv` post-run. A pre-existing read-only
`.csv` sibling would silently break the rename. Loop over
`[sql_results_path, sql_results_path.with_extension("csv")]` and
fail early if either exists read-only. DuckDB writes to the primary
path so the .csv check is no-op there.
3. **TSV-aware "Output written to ..." status message.** Per-phase TSV
helpers derive `{filestem}.{kind}.tsv` paths via get_tsv_output_path,
so "Output written to {output}" is misleading for --format tsv.
Surface the derivation pattern in the status message instead.
13 unit + 64 integration tests pass, clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Continuation of the
describegptdecomposition work (follows merged #3720 / #3721). The ~875-linerun_inference_optionsis now a thin orchestrator over eight focused helpers:build_inference_messages(was a nestedget_messagesclosure) — promoted to file-scope free fn.run_dictionary_phase,run_description_phase,run_tags_phase— one per inference kind.run_prompt_phase— handles session loading, relevance-check, sliding-window, LLM call, and session-state mutation. Returns aPromptPhaseOutcomestruct carrying the completion,has_sql_queryflag, mutated session state, and the system prompt that the SQL phase reuses.execute_sql_query_phase— extracts the ~400-line SQL-results block (file validation, fence extraction, scoresql retry loop, DuckDB/Polars dispatch, session tracking).finalize_structured_output— JSON/TOON emission at end of run.run_inference_optionsis now: create client,check_model, dispatch phases in order, apply max-tokens gate against the last phase's tokens, route SQL, finalize output, persist session.Minor consolidations
get_prompt_file(args)?.base_urlis now computed once at the orchestrator instead of 4 times (once per phase). One visible behavioral delta: a malformed prompt-file now errors before the first LLM call rather than after the Dictionary phase — strictly better UX (no tokens wasted on failure).get_output_format(args)?is computed once and passed to each phase instead of re-derived per phase.Test plan
cargo build --locked --bin qsv -F all_featurescargo clippy -F all_features --bin qsvcargo t describegpt -F all_features— 64 integration tests passcargo test --locked -F all_features --bin qsv describegpt::tests— 13 unit tests passdescribegpt --all --prompt "..." --sql-results out.csvagainst a sample CSV to exercise Dictionary → Description → Tags → Prompt → SQL pipeline end-to-end.Deferred (from the handoff plan)
Still open:
rundecomposition (~700 lines),cache.rssubmodule extraction (gated on#[io_cached]proc-macro requalification), Tier 3 polish bundle (triplicated dictionary formatters, SQL-fence regex hoist, named constants,.clone()trims, parser unit tests),try_fix_jsonedge cases.🤖 Generated with Claude Code