feat(describegpt): add --markdown-template for customizable Markdown output#3834
Merged
Conversation
…output
Adds a `--markdown-template <file>` flag and a default
`resources/describegpt_md_defaults.toml` so users can customize the
Markdown layout without recompiling. Per-inference-type templates
(`dictionary_md_template`, `description_md_template`, `tags_md_template`,
`custom_prompt_md_template`) drive the H1/REASONING/TOKEN-USAGE wrapper,
and a separate `dictionary_md_body_template` drives the per-field table
that fills `{{ llm_response }}` for the Dictionary kind. The wrapper
templates and the body template share one Mini Jinja Environment with
custom filters: `pipe_escape`, `br_replace`, `human_count`, `dict_cell`,
`humanize_examples`. Defaults reproduce the legacy hardcoded `format!()`
output byte-for-byte; two embedded tests lock that in.
Also fixes a pre-existing race in `get_prompt_file` where two threads
racing to initialize `PROMPT_FILE` would panic the loser via
`OnceLock::set().unwrap()`; now the loser silently uses the winner's
value (same pattern as the new `get_md_template_file`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Thread a single SharedRenderCtx (attribution + timestamp) through
render_dictionary_md_body and render_markdown_template so the body
footer's attribution and the wrapper's `{{ timestamp }}` /
`{{ generated_by_signature }}` are byte-identical in the same phase.
- Cache the Mini Jinja Environment in a LazyLock so filters
(pipe_escape, br_replace, human_count, dict_cell, humanize_examples)
are registered once instead of on every render call.
- Add `default_wrapper_templates_are_byte_identical` test plus a TOML
comment locking the four wrapper templates in sync — drift fails
loudly instead of silently diverging.
- Reuse the shared timestamp from the SharedRenderCtx (no second
`chrono::Utc::now()` call per dictionary phase).
- Add a doc comment on PROMPT_FILE / MD_TEMPLATE_FILE explaining the
test-binary singleton constraint and what tests touching custom
templates would need.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | -11 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a TOML-driven MiniJinja templating layer for qsv describegpt Markdown output so users can customize the rendered Markdown (including the dictionary table body) without recompiling, while preserving byte-identical legacy output via default templates and tests.
Changes:
- Introduces
--markdown-template <file>and a loader that parses/caches a Markdown template TOML (or uses an embedded default). - Refactors dictionary Markdown rendering into MiniJinja templates + custom filters, using a shared per-phase render context (single attribution + timestamp).
- Adds
resources/describegpt_md_defaults.toml, updates generated help docs, and adds tests asserting byte-identity with the legacy Markdown output.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/cmd/describegpt/formatters.rs | Removes the legacy hardcoded dictionary Markdown formatter (dictionary Markdown now template-driven). |
| src/cmd/describegpt.rs | Adds --markdown-template, template loading/rendering helpers, shared render context, and byte-identity tests; also fixes a OnceLock::set().unwrap() race in prompt loading. |
| resources/describegpt_md_defaults.toml | New embedded default Markdown template TOML (wrapper + dictionary-body templates, with documented variables/filters). |
| docs/help/describegpt.md | Regenerated help docs to include --markdown-template. |
- USAGE help and generated docs now mention dictionary_md_body_template and the available custom Mini Jinja filters, pointing users at the inline variable/filter docs in the default TOML. - Markdown template TOML parse error now includes the source path (or "<default embedded TOML>") so users can tell which file failed. - Metadata fields (name/description/author/version) on MarkdownTemplateFile are now `#[serde(default)]` so users can supply a minimal --markdown-template file without copying the metadata header. - Replaced struct-level `#[allow(dead_code)]` with per-field allows on only the genuinely-unused metadata fields, so future genuinely unused additions get caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Added MarkdownTemplateOverride struct with per-field Option<String>
fields and an `apply_to` overlay that fills any omitted field from
the embedded defaults. Now a user can drop in a --markdown-template
TOML containing only the templates they want to change instead of
copying the entire default file. USAGE updated to mention this.
- New `markdown_template_override_falls_back_per_field` test locks in
that omitted templates fall back to embedded defaults.
- Inlined `source_label` inside the toml::from_str map_err closure
using `args.flag_markdown_template.as_deref().unwrap_or("...")`,
removing the cross-branch tuple binding that was fragile under
future refactors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI uses `cargo +nightly fmt --check`, which respects the unstable `struct_field_align_threshold = 20` in rustfmt.toml. My local stable rustfmt silently ignores that option, so a few struct field declarations landed with hand-aligned colons that nightly rejects. Run nightly fmt to bring the file in line with what CI expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…indows CI)
The markdown_default_template_byte_identical_to_legacy and
dictionary_body_default_template_byte_identical_to_legacy tests fail on
Windows runners because:
- `* text=auto` in .gitattributes lets git check out *.toml with CRLF
on Windows; `include_str!` then bundles the embedded default
`describegpt_md_defaults.toml` with CRLF in its template strings.
- Mini Jinja preserves the source line endings verbatim, so the
rendered Markdown carries CRLF on Windows.
- The legacy `format!()` baseline the tests compare against is LF only.
Two-pronged fix:
1. Code-level: normalize CRLF -> LF in `get_md_template_file` before
handing content to `toml::from_str`, so both the embedded default
and any user-supplied --markdown-template TOML render with LF
regardless of the platform git checked them out on.
2. .gitattributes: add `*.toml text eol=lf` so the TOML files are
committed AND checked out as LF on every platform.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
jqnatividad
added a commit
that referenced
this pull request
May 9, 2026
…arent dir (#3838) * fix(moarstats): retry on stats coverage mismatch + fsync joined CSV parent dir Two conservative guards for the Linux-only flake on test_moarstats::moarstats_join_type_full_runs_and_writes_bivariate (CI run 25545197594, surfaced via PR #3834). The macOS fix (#3830) and Windows fix (#3831) hardened the parent-side fsync. Linux still surfaces the same symptom — bivariate output missing secondary columns — at a much lower frequency. The most plausible remaining gap is the `qsv stats`-on-joined-CSV subprocess handoff: the spawned child has been observed to produce stats records for fewer columns than the joined CSV actually has. Two guards: 1. Retry once on stats-coverage mismatch. After `qsv stats` runs on the joined CSV, cross-check that every column in the joined CSV's header appears in the stats output's `field` column. On mismatch, re-fsync the joined CSV and re-run `qsv stats` once. Only fail loud if the mismatch persists. Self-heals the rare race; surfaces a precise error if a different root cause ever appears. 2. Fsync the joined CSV's parent directory after `join_datasets_internal` returns. `fsync(file)` on Linux doesn't flush parent-dir metadata, and rare FS/timing edge cases (overlayfs, fuse, network mounts) can affect read-after-write visibility for the follow-up subprocess. New helper `util::sync_directory()` is a no-op on Windows, where dir-handle `FlushFileBuffers` would error. Also includes `tools/repro_moarstats_linux_flake.sh` — a parallel-worker harness that replicates the failing test as a shell loop, optionally adds page-cache pressure (DD_LOAD=1), and classifies failures into "retry caught it" / "retry didn't catch it" / "different bug" buckets to confirm root cause on a Linux VM. Verified on macOS: `cargo build --locked --bin qsv -F all_features` and `cargo clippy --locked --bin qsv -F all_features -- -D warnings` clean; `cargo t -F all_features --test tests test_moarstats::` 80/80 pass at --test-threads=8; failing test in a 30x stress loop all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address copilot review: best-effort dir fsync, error propagation, script hardening All six Copilot comments on PR #3838 applied: src/util.rs - sync_directory: errors no longer propagated. Directory fsync is not uniformly supported across Linux filesystems (FAT, some FUSE/overlay/ network mounts return EINVAL/EOPNOTSUPP); a defensive guard layered on top of file-level fsync must NOT fail the caller. Errors are now logged at debug level and swallowed. Signature changed from CliResult<()> to () to make the contract explicit. - New unit test test_sync_directory_smoke covering both the existing-dir and missing-dir paths (Unix-only). src/cmd/moarstats.rs - Call site updated for the new sync_directory signature. - Stats-output reader now propagates csv parse errors instead of silently dropping rows via filter_map(Result::ok). A malformed/ truncated stats row now surfaces as "Failed to parse joined-stats row N" rather than as a misleading downstream "missing columns" assertion — which is exactly the diagnostic mode this fix is trying to eliminate. tools/repro_moarstats_linux_flake.sh - FIFO is now created inside a real mktemp -d directory, replacing the inherently racy mktemp -u. - cleanup trap now removes LOAD_DIR (when DD_LOAD=1) and FIFO_DIR so temp resources are not leaked on interrupts/early exits. - bivariate awk check now distinguishes "header malformed" (exit 2 -> FAIL_OTHER) from "coverage failed" (exit 1 -> FAIL_OLD_BIVARIATE), so a truncated bivariate header is not silently misclassified as a column-drop bug. Verified on macOS: cargo build --locked clean, cargo clippy -- -D warnings clean, all 80 test_moarstats tests pass at --test-threads=8, the new test_sync_directory_smoke unit test passes, smoke-run of repro script green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--markdown-template <file>todescribegptso users can customize Markdown output without recompiling, with an embeddedresources/describegpt_md_defaults.tomldefault that reproduces the legacyformat!()wrapper byte-for-byte.formatters::format_dictionary_markdowninto adictionary_md_body_template, exposing structuredentries+addl_col_namesplus custom Mini Jinja filters (pipe_escape,br_replace,human_count,dict_cell,humanize_examples) so the per-field layout is also TOML-driven.SharedRenderCtx(one attribution + one timestamp) through both render helpers per phase, caches the Mini JinjaEnvironmentin aLazyLock, and locks in byte-identity to the legacy output via three embedded tests.OnceLock::set().unwrap()race inget_prompt_filethat could panic the loser when two threads raced to initialize the prompt file (latent until two unit tests started exercising it concurrently).Test plan
cargo build --locked --bin qsv -F all_featurescargo test describegpt -F all_features— 64 integration tests passcargo test cmd::describegpt::tests -F all_features— 16 embedded module tests pass (including 3 new byte-identity tests)QSV_TEST_DESCRIBEGPT=1 cargo test describegpt -F all_features— 64 CI tests pass against live LM Studiocargo clippy --bin qsv -F all_features --no-deps— cleanqsv --generate-help-md—docs/help/describegpt.mdregeneratedsniff: refactor to just use the csv crate and drop the qsv-sniffer crate #1976 — addressed all 5 Low findings, closedFor #3735
🤖 Generated with Claude Code