feat(describegpt): add --markdown-template for customizable Markdown output#3834

jqnatividad · 2026-05-07T20:34:39Z

Summary

Adds --markdown-template <file> to describegpt so users can customize Markdown output without recompiling, with an embedded resources/describegpt_md_defaults.toml default that reproduces the legacy format!() wrapper byte-for-byte.
Refactors the dictionary table from formatters::format_dictionary_markdown into a dictionary_md_body_template, exposing structured entries + addl_col_names plus custom Mini Jinja filters (pipe_escape, br_replace, human_count, dict_cell, humanize_examples) so the per-field layout is also TOML-driven.
Threads a single SharedRenderCtx (one attribution + one timestamp) through both render helpers per phase, caches the Mini Jinja Environment in a LazyLock, and locks in byte-identity to the legacy output via three embedded tests.
Drive-by: fixes a pre-existing OnceLock::set().unwrap() race in get_prompt_file that could panic the loser when two threads raced to initialize the prompt file (latent until two unit tests started exercising it concurrently).

Test plan

cargo build --locked --bin qsv -F all_features
cargo test describegpt -F all_features — 64 integration tests pass
cargo test cmd::describegpt::tests -F all_features — 16 embedded module tests pass (including 3 new byte-identity tests)
QSV_TEST_DESCRIBEGPT=1 cargo test describegpt -F all_features — 64 CI tests pass against live LM Studio
cargo clippy --bin qsv -F all_features --no-deps — clean
qsv --generate-help-md — docs/help/describegpt.md regenerated
roborev review sniff: refactor to just use the csv crate and drop the qsv-sniffer crate #1976 — addressed all 5 Low findings, closed

🤖 Generated with Claude Code

…output Adds a `--markdown-template <file>` flag and a default `resources/describegpt_md_defaults.toml` so users can customize the Markdown layout without recompiling. Per-inference-type templates (`dictionary_md_template`, `description_md_template`, `tags_md_template`, `custom_prompt_md_template`) drive the H1/REASONING/TOKEN-USAGE wrapper, and a separate `dictionary_md_body_template` drives the per-field table that fills `{{ llm_response }}` for the Dictionary kind. The wrapper templates and the body template share one Mini Jinja Environment with custom filters: `pipe_escape`, `br_replace`, `human_count`, `dict_cell`, `humanize_examples`. Defaults reproduce the legacy hardcoded `format!()` output byte-for-byte; two embedded tests lock that in. Also fixes a pre-existing race in `get_prompt_file` where two threads racing to initialize `PROMPT_FILE` would panic the loser via `OnceLock::set().unwrap()`; now the loser silently uses the winner's value (same pattern as the new `get_md_template_file`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Thread a single SharedRenderCtx (attribution + timestamp) through render_dictionary_md_body and render_markdown_template so the body footer's attribution and the wrapper's `{{ timestamp }}` / `{{ generated_by_signature }}` are byte-identical in the same phase. - Cache the Mini Jinja Environment in a LazyLock so filters (pipe_escape, br_replace, human_count, dict_cell, humanize_examples) are registered once instead of on every render call. - Add `default_wrapper_templates_are_byte_identical` test plus a TOML comment locking the four wrapper templates in sync — drift fails loudly instead of silently diverging. - Reuse the shared timestamp from the SharedRenderCtx (no second `chrono::Utc::now()` call per dictionary phase). - Add a doc comment on PROMPT_FILE / MD_TEMPLATE_FILE explaining the test-binary singleton constraint and what tests touching custom templates would need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codacy-production · 2026-05-07T20:35:46Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -11 complexity

Metric Results

Complexity -11

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

Copilot

Pull request overview

Adds a TOML-driven MiniJinja templating layer for qsv describegpt Markdown output so users can customize the rendered Markdown (including the dictionary table body) without recompiling, while preserving byte-identical legacy output via default templates and tests.

Changes:

Introduces --markdown-template <file> and a loader that parses/caches a Markdown template TOML (or uses an embedded default).
Refactors dictionary Markdown rendering into MiniJinja templates + custom filters, using a shared per-phase render context (single attribution + timestamp).
Adds resources/describegpt_md_defaults.toml, updates generated help docs, and adds tests asserting byte-identity with the legacy Markdown output.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/cmd/describegpt/formatters.rs	Removes the legacy hardcoded dictionary Markdown formatter (dictionary Markdown now template-driven).
src/cmd/describegpt.rs	Adds `--markdown-template`, template loading/rendering helpers, shared render context, and byte-identity tests; also fixes a `OnceLock::set().unwrap()` race in prompt loading.
resources/describegpt_md_defaults.toml	New embedded default Markdown template TOML (wrapper + dictionary-body templates, with documented variables/filters).
docs/help/describegpt.md	Regenerated help docs to include `--markdown-template`.

- USAGE help and generated docs now mention dictionary_md_body_template and the available custom Mini Jinja filters, pointing users at the inline variable/filter docs in the default TOML. - Markdown template TOML parse error now includes the source path (or "<default embedded TOML>") so users can tell which file failed. - Metadata fields (name/description/author/version) on MarkdownTemplateFile are now `#[serde(default)]` so users can supply a minimal --markdown-template file without copying the metadata header. - Replaced struct-level `#[allow(dead_code)]` with per-field allows on only the genuinely-unused metadata fields, so future genuinely unused additions get caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Added MarkdownTemplateOverride struct with per-field Option<String> fields and an `apply_to` overlay that fills any omitted field from the embedded defaults. Now a user can drop in a --markdown-template TOML containing only the templates they want to change instead of copying the entire default file. USAGE updated to mention this. - New `markdown_template_override_falls_back_per_field` test locks in that omitted templates fall back to embedded defaults. - Inlined `source_label` inside the toml::from_str map_err closure using `args.flag_markdown_template.as_deref().unwrap_or("...")`, removing the cross-branch tuple binding that was fragile under future refactors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI uses `cargo +nightly fmt --check`, which respects the unstable `struct_field_align_threshold = 20` in rustfmt.toml. My local stable rustfmt silently ignores that option, so a few struct field declarations landed with hand-aligned colons that nightly rejects. Run nightly fmt to bring the file in line with what CI expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…indows CI) The markdown_default_template_byte_identical_to_legacy and dictionary_body_default_template_byte_identical_to_legacy tests fail on Windows runners because: - `* text=auto` in .gitattributes lets git check out *.toml with CRLF on Windows; `include_str!` then bundles the embedded default `describegpt_md_defaults.toml` with CRLF in its template strings. - Mini Jinja preserves the source line endings verbatim, so the rendered Markdown carries CRLF on Windows. - The legacy `format!()` baseline the tests compare against is LF only. Two-pronged fix: 1. Code-level: normalize CRLF -> LF in `get_md_template_file` before handing content to `toml::from_str`, so both the embedded default and any user-supplied --markdown-template TOML render with LF regardless of the platform git checked them out on. 2. .gitattributes: add `*.toml text eol=lf` so the TOML files are committed AND checked out as LF on every platform. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arent dir (#3838) * fix(moarstats): retry on stats coverage mismatch + fsync joined CSV parent dir Two conservative guards for the Linux-only flake on test_moarstats::moarstats_join_type_full_runs_and_writes_bivariate (CI run 25545197594, surfaced via PR #3834). The macOS fix (#3830) and Windows fix (#3831) hardened the parent-side fsync. Linux still surfaces the same symptom — bivariate output missing secondary columns — at a much lower frequency. The most plausible remaining gap is the `qsv stats`-on-joined-CSV subprocess handoff: the spawned child has been observed to produce stats records for fewer columns than the joined CSV actually has. Two guards: 1. Retry once on stats-coverage mismatch. After `qsv stats` runs on the joined CSV, cross-check that every column in the joined CSV's header appears in the stats output's `field` column. On mismatch, re-fsync the joined CSV and re-run `qsv stats` once. Only fail loud if the mismatch persists. Self-heals the rare race; surfaces a precise error if a different root cause ever appears. 2. Fsync the joined CSV's parent directory after `join_datasets_internal` returns. `fsync(file)` on Linux doesn't flush parent-dir metadata, and rare FS/timing edge cases (overlayfs, fuse, network mounts) can affect read-after-write visibility for the follow-up subprocess. New helper `util::sync_directory()` is a no-op on Windows, where dir-handle `FlushFileBuffers` would error. Also includes `tools/repro_moarstats_linux_flake.sh` — a parallel-worker harness that replicates the failing test as a shell loop, optionally adds page-cache pressure (DD_LOAD=1), and classifies failures into "retry caught it" / "retry didn't catch it" / "different bug" buckets to confirm root cause on a Linux VM. Verified on macOS: `cargo build --locked --bin qsv -F all_features` and `cargo clippy --locked --bin qsv -F all_features -- -D warnings` clean; `cargo t -F all_features --test tests test_moarstats::` 80/80 pass at --test-threads=8; failing test in a 30x stress loop all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address copilot review: best-effort dir fsync, error propagation, script hardening All six Copilot comments on PR #3838 applied: src/util.rs - sync_directory: errors no longer propagated. Directory fsync is not uniformly supported across Linux filesystems (FAT, some FUSE/overlay/ network mounts return EINVAL/EOPNOTSUPP); a defensive guard layered on top of file-level fsync must NOT fail the caller. Errors are now logged at debug level and swallowed. Signature changed from CliResult<()> to () to make the contract explicit. - New unit test test_sync_directory_smoke covering both the existing-dir and missing-dir paths (Unix-only). src/cmd/moarstats.rs - Call site updated for the new sync_directory signature. - Stats-output reader now propagates csv parse errors instead of silently dropping rows via filter_map(Result::ok). A malformed/ truncated stats row now surfaces as "Failed to parse joined-stats row N" rather than as a misleading downstream "missing columns" assertion — which is exactly the diagnostic mode this fix is trying to eliminate. tools/repro_moarstats_linux_flake.sh - FIFO is now created inside a real mktemp -d directory, replacing the inherently racy mktemp -u. - cleanup trap now removes LOAD_DIR (when DD_LOAD=1) and FIFO_DIR so temp resources are not leaked on interrupts/early exits. - bivariate awk check now distinguishes "header malformed" (exit 2 -> FAIL_OTHER) from "coverage failed" (exit 1 -> FAIL_OLD_BIVARIATE), so a truncated bivariate header is not silently misclassified as a column-drop bug. Verified on macOS: cargo build --locked clean, cargo clippy -- -D warnings clean, all 80 test_moarstats tests pass at --test-threads=8, the new test_sync_directory_smoke unit test passes, smoke-run of repro script green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jqnatividad and others added 3 commits May 7, 2026 16:22

chore: rust fmt

b44a150

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread src/cmd/describegpt.rs Dismissed

Comment thread src/cmd/describegpt.rs Dismissed

jqnatividad requested a review from Copilot May 7, 2026 20:36

Copilot started reviewing on behalf of jqnatividad May 7, 2026 20:36 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/cmd/describegpt.rs Outdated

Comment thread docs/help/describegpt.md Outdated

Comment thread src/cmd/describegpt.rs Outdated

Comment thread src/cmd/describegpt.rs Outdated

jqnatividad and others added 4 commits May 7, 2026 16:44

jqnatividad merged commit a8d5cc5 into master May 8, 2026
15 of 17 checks passed

jqnatividad deleted the describegpt-markdown-template branch May 8, 2026 08:55

jqnatividad mentioned this pull request May 9, 2026

fix(moarstats): retry on stats coverage mismatch + fsync joined CSV parent dir #3838

Merged

6 tasks

BrewTestBot mentioned this pull request May 18, 2026

qsv 20.1.0 Homebrew/homebrew-core#283447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(describegpt): add --markdown-template for customizable Markdown output#3834