Skip to content

merge Transpose PR#3

Merged
jqnatividad merged 2 commits into
dathere:masterfrom
mintyplanet:transpose
Dec 27, 2020
Merged

merge Transpose PR#3
jqnatividad merged 2 commits into
dathere:masterfrom
mintyplanet:transpose

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

    transpose command to transpose rows/columns of CSV data.

    PR BurntSushi#137
@jqnatividad jqnatividad merged commit 3fd2edd into dathere:master Dec 27, 2020
jqnatividad added a commit that referenced this pull request Dec 27, 2025
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jqnatividad added a commit that referenced this pull request Dec 27, 2025
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jqnatividad added a commit that referenced this pull request Jan 1, 2026
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jqnatividad added a commit that referenced this pull request Feb 20, 2026
All 160 frequency tests pass. The diagnostics about `qsv_bin` are unrelated to my changes.
Changes:
- Fixed misleading comment: removed "non-positive" from the safety comment about weight filtering since `debug_assert!(is_finite())` only checks for NaN/Inf, not non-positive values
- Added clarifying comments on descending `partition_point` calls to document that null is placed after entries with equal weight/count (acknowledging the behavior change from the original linear search)

Address review findings (job 259)

All 160 frequency tests pass. Here's a summary of the changes:
Changes:
- Added `#[inline]` back to `weighted_add` to enable inlining into function pointer call sites, mitigating the indirect-call overhead noted in finding #3
- Added comment on `debug_assert!` in `weighted_add` clarifying that upstream validation already filters invalid weights (finding #2)
- Added CHANGELOG entry under `[Unreleased]` documenting the `partition_point` optimization and the **BREAKING** behavioral change in null tie-breaking order for `frequency --null-sorted` (finding #1)

Address review findings (job 257)

All 160 frequency tests pass.
Changes:
- Added `debug_assert` in `weighted_add` to verify weight is finite and positive, making the invariant self-documenting and consistent with the safety comment at line 2244 (the upstream filter at line 2739 already correctly skips non-positive weights before calling `weighted_add`)

Address review findings (job 256)

All 4 tests pass. Build succeeds.
Changes:
- Removed "matching original behavior" from comments on null insertion with ties (it's actually a behavior change from the `position()` to `partition_point()` switch)
- Fixed safety comment to mention non-positive weights are also filtered during accumulation (not just NaN/Inf)
- Removed `#[inline]` from `weighted_add` since it's called through a function pointer and the annotation is misleading

Address review findings (job 252)

All 4 tests pass with exact position assertions.
Changes:
- Replace range assertions with exact position assertions in all 4 tie-breaking tests to match the deterministic `partition_point` tie-breaking policy
- Weighted desc/asc tests now assert null_pos == 2 (exact)
- Unweighted desc test asserts null_pos == 3 (null placed after all tied entries)
- Unweighted asc test asserts null_pos == 1 (null placed before all tied entries), differentiating it from the desc test

Address review findings (job 251)

All 160 frequency tests pass.
Changes:
- Reverted `#[inline(always)]` to `#[inline]` on `weighted_add` since inlining through function pointers isn't guaranteed anyway
- Added 4 tests for null insertion with ties: weighted desc, weighted asc, unweighted desc, unweighted asc — covering the `partition_point` edge case where null has the same count/weight as other entries

Address review findings (job 249)

All 156 frequency tests pass. The diagnostics about `qsv_bin` are pre-existing issues unrelated to this change.
Changes:
- Fix ascending `partition_point` predicate to use `<` instead of `<=` for both weighted and unweighted null insertion, restoring the original tie-breaking behavior where null is placed before entries with equal weight/count

Address review findings (job 247)

All 156 frequency tests pass. The diagnostic warnings about `qsv_bin` are pre-existing and unrelated to our changes.
Changes:
- Add `debug_assert!` to verify sort-order invariant before `partition_point` in weighted null-insertion path
- Add `debug_assert!` to verify sort-order invariant before `partition_point` in unweighted null-insertion path

Address review findings (job 245)

All 156 frequency tests pass.
Changes:
- Add `debug_assert!` checking all weights in `counts_final` are finite, not just the null weight, for `partition_point` correctness (finding #3)
- Add comments noting `field_buffer` borrows are transient and safe to reuse across iterations in both weighted and unweighted ignore-case paths (finding #1)

Address review findings (job 243)

No clippy warnings for frequency.rs. All changes are clean.
Changes:
- Add `debug_assert!(null_weight_val.is_finite())` before weighted `partition_point` calls to guard against NaN float values breaking binary search
- Add safety comment for unweighted `partition_point` noting u64 counts are always finite
- Change `weighted_add` from `#[inline]` to `#[inline(always)]` to ensure inlining in the hot path through function pointers
jqnatividad added a commit that referenced this pull request Feb 24, 2026
Syntax check passed. All changes are complete.
Changes:
- Replace macOS `read -t` fallback with `head -c 65536 | jq` pipeline to fix silent failure when `timeout` is unavailable (finding 1)
- Move version-change note from header preamble to Tool Discovery section where it's more contextually relevant (finding 4)

Address review findings (job 436)

Script passes syntax check. This is a shell script, not a Rust source file, so the standard `cargo build`/`cargo test` commands aren't relevant here. The change is minimal and self-contained.
Changes:
- Added `-n 65536` to `read` builtin in the macOS fallback branch to enforce the same 64KB size limit as the `timeout` branch, addressing the size guard inconsistency

Address review findings (job 435)

Syntax check passed. This is a shell script only — no Rust build or tests needed for these changes.
Changes:
- Use bash `read -t 5` fallback instead of bare `head` on systems without `timeout` (macOS) to prevent indefinite stdin blocking
- Emit diagnostic `additionalContext` message when `CLAUDE_PLUGIN_ROOT` is unset, aiding troubleshooting
- Replace instruction-to-AI phrasing ("Inform the user...") with neutral factual messages in JSON output

Address review findings (job 433)

Script syntax is valid. All three review findings are addressed.
Changes:
- Add `command -v timeout` check with fallback to plain `head` for macOS compatibility (issues #1/#3)
- Guard against empty/unset `CLAUDE_PLUGIN_ROOT` before `cd` to prevent unexpected `$HOME` resolution (issue #2)

Address review findings (job 432)

Script syntax is valid (no output = no errors).
Changes:
- Add 5-second timeout to stdin read to prevent indefinite blocking if no input is provided
- Guard against deploying CLAUDE.md into the plugin's own directory tree
- Replace hardcoded version "v16.1" with version-agnostic wording in cowork template
- Add `stats` (with extended stats) to the memory-intensive commands list in cowork template

Address review findings (job 430)

Script syntax is valid.
Changes:
- Guard `jq` parse of stdin against truncated JSON: redirect stderr and fall back to empty `CWD` on failure, so `set -e` won't abort on malformed input
- Replace `realpath` with POSIX-portable `cd "$CWD" && pwd -P` for symlink resolution, ensuring compatibility on minimal macOS and CI images without GNU coreutils

Address review findings (job 429)

All changes look correct. Since this is a shell script and markdown file (no Rust code changes), there's no build or test to run.
Changes:
- Limit stdin read to 64KB (`head -c 65536`) to prevent hangs on malformed/endless input
- Resolve CWD with `realpath` to prevent path traversal via symlinks
- Add `QSV_NO_COWORK_SETUP=1` env var opt-out mechanism
- Wrap `cp` in error handling to produce a friendly JSON message instead of failing with `set -e`
- Add version note and opt-out instructions to the cowork-CLAUDE.md template header

Address review findings (job 427)

Script syntax is valid (no output means no errors).
Changes:
- Redirect `jq`-missing diagnostic from stderr to stdout so the hook framework can surface it as `additionalContext` to the agent

Address review findings (job 426)

Both files validate correctly. The diagnostic errors about `qsv_bin` are pre-existing and unrelated to these changes.
Changes:
- Use here-string (`<<<`) instead of `echo | jq` to avoid escape sequence mangling in JSON input
- Add `jq` availability check with friendly message instead of cryptic hook error
- Remove misleading `"matcher": "startup"` from SessionStart hook config
- Use `jq -n` to construct output JSON safely, preventing malformed JSON from paths with special characters
- Remove unverified `QSV_MCP_OPERATION_TIMEOUT_MS` / `qsv_config` references from cowork-CLAUDE.md template
jqnatividad added a commit that referenced this pull request Mar 2, 2026
All 421 tests pass, 0 failures. The change is correct.
Changes:
- Fix Unicode truncation fast-path to use UTF-16 length as a cheap guard (strings shorter in UTF-16 are guaranteed shorter in codepoints), only performing expensive `Array.from()` codepoint conversion when the string exceeds the limit

Address review findings (job 606)

All 72 tests pass (0 failures), including the 3 new tests for missing params.
Changes:
- Check for `null`/`undefined` params explicitly before string coercion in `handleLogCall`, returning clear "is required" error messages (finding #1)
- Trim and strip newlines from log messages before writing, preventing multi-line log entries and inconsistent whitespace (findings #2, #3)
- Added tests for missing `entry_type`, missing `message`, and entirely empty params (finding #5)

Address review findings (job 607)

All 418 tests pass, including the new one.
Changes:
- Add test for newline-only message (`'\n\n'`) confirming it's rejected as non-empty string

Address review findings (job 609)

All 74 tests pass (0 failures), including all the new and existing `handleLogCall` tests.
Changes:
- Log `catch` block now writes error details to stderr via `console.error` instead of silently swallowing
- Added `--` separator before the message argument in `qsv log` CLI call to prevent messages starting with `-` from being misinterpreted as flags
- Documented newline collapsing behavior in the tool description ("Newlines are collapsed to spaces")
- Added test for non-string type coercion (`{ entry_type: 123, message: true }`) confirming `String()` coercion behavior

Address review findings (job 610)

All 420 tests pass.
Changes:
- Include truncated error message in the success result returned to the agent (not just stderr), so the agent has actionable context when `qsv_log` write fails
- Add test for non-string message coercion with valid `entry_type` to verify `String()` coercion works for the message path

Address review findings (job 611)

All 420 tests pass. The Rust diagnostics are pre-existing and unrelated to this change.
Changes:
- Added `assert.ok(!result.isError)` to the `handleLogCall` non-string message coercion test to explicitly verify the result is not an error, making the test intent clearer

Address review findings (job 613)

All 420 tests pass, 0 failures. The changes are verified.
Changes:
- Added comment on `--` separator in `handleLogCall` args explaining it guards against messages starting with `-` being parsed as flags (addresses medium finding)
- Added `config.qsvValidation.valid` skip guard to `handleLogCall coerces non-string message` test so it properly tests the success path instead of passing accidentally via error swallowing (addresses low finding #4)
- Added assertion that success response doesn't contain "warning" to confirm actual success vs swallowed error

Address review findings (job 615)

No CLAUDE.md changes needed for the `--` removal. All changes are complete and tests pass.
Changes:
- Remove unnecessary `--` end-of-options sentinel from `qsv log` args — `qsv log` uses docopt variadic `[<message>...]` which handles this correctly, and messages always start with `[entry_type]` so they can never be misinterpreted as flags
- Fix Unicode-safe truncation using `Array.from()` instead of `String.slice()` to avoid splitting surrogate pairs in non-ASCII messages
- Add throttling guidance to server instructions ("Avoid excessive logging — for simple interactions, a single user_prompt + result_summary pair is enough")
- Add test for the `handleLogCall` error-swallowing catch path using a non-existent working directory

Address review findings (job 616)

The change looks correct. The length check and truncation now both operate on codepoints consistently.
Changes:
- Fix Unicode truncation length mismatch: use codepoint count (`Array.from(sanitized).length`) for both the gate condition and the truncation, avoiding inconsistency between UTF-16 `.length` and codepoint-aware `Array.from().slice()`

Address review findings (job 618)

All 421 tests pass, 0 failures. All `handleLogCall` tests pass including the updated write-failure test.
Changes:
- Reworded catch-path message from misleading `"Logged ... (warning: write failed: ...)"` to clearer `"Log write failed (non-fatal): ... Workflow continues."` (issue 1)
- Added fast-path optimization for Unicode truncation: only call `Array.from()` when `sanitized.length > MAX_LOG_MESSAGE_LEN`, avoiding unnecessary codepoint conversion on short messages (issue 3)
- Updated test assertions to match the new error message wording
jqnatividad added a commit that referenced this pull request Mar 2, 2026
* feat(mcp): add qsv_log core tool for agent-initiated reproducibility logging

Enable agents to write structured entries (user_prompt, agent_reasoning,
agent_action, result_summary, note) to the qsv audit log (qsvmcp.log)
with u- prefixed UUIDs, distinct from automatic s-/e- audit entries.
Automatic audit logging is skipped for qsv_log calls to avoid recursion.
Messages are truncated at 4096 chars and logging failures never break
the workflow. Server instructions updated to guide agents on when/how
to log for third-party reproducibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address review findings (job 619)

All 421 tests pass, 0 failures. The change is correct.
Changes:
- Fix Unicode truncation fast-path to use UTF-16 length as a cheap guard (strings shorter in UTF-16 are guaranteed shorter in codepoints), only performing expensive `Array.from()` codepoint conversion when the string exceeds the limit

Address review findings (job 606)

All 72 tests pass (0 failures), including the 3 new tests for missing params.
Changes:
- Check for `null`/`undefined` params explicitly before string coercion in `handleLogCall`, returning clear "is required" error messages (finding #1)
- Trim and strip newlines from log messages before writing, preventing multi-line log entries and inconsistent whitespace (findings #2, #3)
- Added tests for missing `entry_type`, missing `message`, and entirely empty params (finding #5)

Address review findings (job 607)

All 418 tests pass, including the new one.
Changes:
- Add test for newline-only message (`'\n\n'`) confirming it's rejected as non-empty string

Address review findings (job 609)

All 74 tests pass (0 failures), including all the new and existing `handleLogCall` tests.
Changes:
- Log `catch` block now writes error details to stderr via `console.error` instead of silently swallowing
- Added `--` separator before the message argument in `qsv log` CLI call to prevent messages starting with `-` from being misinterpreted as flags
- Documented newline collapsing behavior in the tool description ("Newlines are collapsed to spaces")
- Added test for non-string type coercion (`{ entry_type: 123, message: true }`) confirming `String()` coercion behavior

Address review findings (job 610)

All 420 tests pass.
Changes:
- Include truncated error message in the success result returned to the agent (not just stderr), so the agent has actionable context when `qsv_log` write fails
- Add test for non-string message coercion with valid `entry_type` to verify `String()` coercion works for the message path

Address review findings (job 611)

All 420 tests pass. The Rust diagnostics are pre-existing and unrelated to this change.
Changes:
- Added `assert.ok(!result.isError)` to the `handleLogCall` non-string message coercion test to explicitly verify the result is not an error, making the test intent clearer

Address review findings (job 613)

All 420 tests pass, 0 failures. The changes are verified.
Changes:
- Added comment on `--` separator in `handleLogCall` args explaining it guards against messages starting with `-` being parsed as flags (addresses medium finding)
- Added `config.qsvValidation.valid` skip guard to `handleLogCall coerces non-string message` test so it properly tests the success path instead of passing accidentally via error swallowing (addresses low finding #4)
- Added assertion that success response doesn't contain "warning" to confirm actual success vs swallowed error

Address review findings (job 615)

No CLAUDE.md changes needed for the `--` removal. All changes are complete and tests pass.
Changes:
- Remove unnecessary `--` end-of-options sentinel from `qsv log` args — `qsv log` uses docopt variadic `[<message>...]` which handles this correctly, and messages always start with `[entry_type]` so they can never be misinterpreted as flags
- Fix Unicode-safe truncation using `Array.from()` instead of `String.slice()` to avoid splitting surrogate pairs in non-ASCII messages
- Add throttling guidance to server instructions ("Avoid excessive logging — for simple interactions, a single user_prompt + result_summary pair is enough")
- Add test for the `handleLogCall` error-swallowing catch path using a non-existent working directory

Address review findings (job 616)

The change looks correct. The length check and truncation now both operate on codepoints consistently.
Changes:
- Fix Unicode truncation length mismatch: use codepoint count (`Array.from(sanitized).length`) for both the gate condition and the truncation, avoiding inconsistency between UTF-16 `.length` and codepoint-aware `Array.from().slice()`

Address review findings (job 618)

All 421 tests pass, 0 failures. All `handleLogCall` tests pass including the updated write-failure test.
Changes:
- Reworded catch-path message from misleading `"Logged ... (warning: write failed: ...)"` to clearer `"Log write failed (non-fatal): ... Workflow continues."` (issue 1)
- Added fast-path optimization for Unicode truncation: only call `Array.from()` when `sanitized.length > MAX_LOG_MESSAGE_LEN`, avoiding unnecessary codepoint conversion on short messages (issue 3)
- Updated test assertions to match the new error message wording

* fix(mcp): address Copilot review findings for qsv_log

- Move skipAuditLog from "Key Constants" to a behavior note in CLAUDE.md
  (it's a local variable, not a module-level constant)
- Reorder enum and LOG_ENTRY_TYPES Set to match description order
  (reasoning before action)
- Add unique temp dir + cleanup to coercion test to prevent log
  file accumulation in OS temp root

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request Apr 23, 2026
…ls (#3734)

* feat(generators): detect required options from Usage: line

Both help_markdown_gen.rs and mcp_skills_gen.rs now identify options
shown outside [options]/[...] groups in the USAGE's `Usage:` section
(e.g. `qsv implode [options] -k <keys> -v <value>`) and mark them
accordingly.

- `docs/help/*.md`: required options get ` **(required)**` appended to
  their description column in the options table.
- `.claude/skills/qsv/*.json`: option entries gain `"required": true`
  when the flag is required. Optional options continue to emit nothing
  (the field is skipped when absent).

A small wrinkle worth noting: qsv-docopt's Parser does not always emit
Atom::Short entries paired with the Long atom for the `-k, --keys`
declaration style, so we can't rely on its pairing to expand short↔long
forms. Both generators do their own pairing pass by scanning the
options sections for `-X, --xxx` declarations.

Closes roborev review #1618 findings #3 and #4 at a project-wide
(generator) level. Findings #1 (empty positional arg descriptions) and
#2 (unfenced CSV examples) remain as separate work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: regenerate help markdown and MCP skills for required-option markers

Commands with required options in their Usage: line now show them:
- applydp: --new-column, --replacement, --formatstr
- apply, describegpt, fetchpost, implode, joinp, luau, py, split:
  various required options previously unmarked

All regenerated via `qsv --generate-help-md` and `qsv --update-mcp-skills`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(generators): only mark an option required when it's required in all Usage variants

The previous heuristic produced seven false positives flagged by roborev
review #1624. The detector now:

1. Computes a per-Usage-variant required set (tokens outside `[...]` AND
   outside any `(A | B)` alternative group), then takes the intersection
   across all non-`--help` Usage lines. An option must be required in
   every variant to be marked globally required.
2. Handles `(A | B | C)` alternative groups by masking them out entirely —
   inside alternatives no individual token is required.
3. Expands short→long aliases per Usage line, before intersection, so
   `-n`'s use as a positional-style flag on one Usage variant doesn't
   leak into `--no-headers` as globally required.

Fixes false-positive required markers on:
- split: `--size` / `--chunks` / `--kb-size` (alternative group)
- joinp: `--cross` / `--non-equi` (separate Usage lines)
- apply / applydp: `--new-column` / `--replacement` / `--formatstr`
  (subcommand-scoped, not global)
- describegpt: `--prepare-context` / `--process-response`
  (separate Usage lines)
- fetchpost: `--payload-tpl` (alternative inside `(A | B)`)
- luau / py: `--no-headers` (the `-n <main-script>` Usage role only
  appears in one variant; intersection excludes it)
- py: `--helper` (only required on one Usage variant)

Implode's `--keys` / `--value` markers are preserved (genuinely required
in the single Usage variant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(generators): share required-options detection in a common module

Addresses roborev review #1625. Extracts the previously-duplicated
required-option detection (and its helpers) out of both generators and
into a new crate::generators_common module. Both mcp_skills_gen and
help_markdown_gen now delegate to it, keeping their detection semantics
in lockstep.

Fixes additional review findings while consolidating:

- Bidirectional short↔long expansion via a new FlagPairs type, so Usage
  lines that mention only the long form also surface the short form in
  the required set (and vice versa).
- Bracket-depth is now u32, and uses u32::saturating_sub so an
  unbalanced `]` cannot underflow the counter (on i32 it would have
  saturated at i32::MIN, silently dropping later required tokens).
- The pair regex now matches long-first (`--keys, -k`) declarations in
  addition to short-first (`-k, --keys`).
- The pair regex scans only the options-declaration portion of the
  USAGE string (after the Usage: block), so a future quirk in the
  Usage: block can't introduce a bogus pair.

Adds 12 unit tests covering: single-variant required expansion,
alternative groups `(A|B|C)`, multi-variant intersection, subcommand-
scoped options, short-role collision (luau/py), plain `(X)` grouping
without a pipe, nested optional inside an alt group, long-first
declarations, long-only Usage mentions, no Usage block, unbalanced
brackets, and Usage-block scoping of the pair regex.

Regenerator outputs (docs/help and .claude/skills) are unchanged by
this refactor (confirmed with --generate-help-md / --update-mcp-skills
producing no diffs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(generators_common): narrow options-section scope, cache regexes, handle continuations

Addresses roborev review #1626:

1. Options-section scope: replace the blank-line-after-Usage heuristic
   with a proper line-anchored, case-insensitive `options:` / `Options:`
   header regex. The previous logic landed on description paragraphs
   (most qsv USAGEs have a description between the Usage block and the
   options section), so the pair scan was broader than intended.

2. Fallback for minimal fixtures: if no options header is found, fall
   back to scanning the whole USAGE so short↔long pairs declared in
   non-standard layouts (or small test fixtures) still register.

3. Compiled regexes (short-first pair, long-first pair, flag scanner,
   options header) are now cached via `std::sync::OnceLock`, removing
   per-call recompilation overhead.

4. Usage-block collection now terminates only on a blank line and merges
   docopt continuation lines (indented lines within the block that do
   not begin with `qsv`) into their parent variant, preventing a wrapped
   Usage line from showing up as a standalone variant and silently
   narrowing the intersection.

Adds three more unit tests covering: continuation-line joining
(`continuation_line_does_not_truncate_usage_block`), scope narrowing
(`pair_regex_scans_only_the_options_section_not_description`), and the
whole-text fallback (`fallback_to_whole_text_when_no_options_section`).
15/15 unit tests pass; regenerator output is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(generators_common): tighten options-header regex and indentation-aware continuation

Addresses roborev review #1627:

1. options_section regex: the leading class is now `[ \t\w-]` instead of
   `[\s\w-]`, so it can't straddle a newline. Trailing `[ \t]*` matches
   only the same-line whitespace, keeping the "line-anchored" claim in
   the doc comment literally true.

2. collect_usage_lines: drop the dead `if trimmed.is_empty()` branch —
   the `take_while(!blank)` already filters blank lines out.

3. collect_usage_lines: continuation detection now prefers indentation
   depth (leading-whitespace count strictly greater than the parent
   variant's) with the `qsv`-prefix rule as a tiebreaker. A continuation
   line whose positional begins with `qsv` (e.g. `qsv-input`) would
   previously have been treated as a new variant.

4. collect_usage_lines: inline `Usage: qsv foo ...` header variants are
   now retained. Previously the `Usage:` line was unconditionally
   dropped, silently losing the only variant for that style of help
   text.

Three additional unit tests lock this in: `Common options:` /
`map options:` prefix-word headers both matched; a tab-indented
`\toptions:` header; and an inline `Usage: qsv foo ...` variant. 18/18
unit tests pass; full `cargo test` passes; generator outputs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(generators_common): baseline-indent continuation detection + cleanups

Addresses roborev review #1628:

1. Indentation comparison now uses a single *baseline indent* (the leading
   whitespace of the first non-blank line in the Usage block) instead of
   comparing each line against the previous variant's indent. This fixes
   the inline-vs-non-inline asymmetry: an inline `Usage: qsv foo --bar`
   header was stored trimmed (leading_ws=0), so a following standard-
   column variant like `       qsv foo --baz` (leading_ws=7) was wrongly
   merged as a continuation. Inline variants are now synthesized at the
   baseline indent for consistent comparison.

2. With the baseline-indent rule, indentation genuinely *outranks* any
   prefix test: continuation == leading_ws > baseline. No more confused
   comment claiming "tiebreaker" while the code actually OR'd. The
   `qsv`-prefix check is gone — indentation is the sole signal.

3. `if let Some(last) = variants.last_mut()` replaces the `match + later
   unwrap()` pattern, removing the SAFETY comment.

Three new unit tests lock this in:
- `indented_wrap_line_merges_into_parent_variant`
- `continuation_starting_with_qsv_prefix_is_still_a_continuation`
  (a deeper-indented `qsv-foo` continuation must fold, regression for
  the indentation-outranks-prefix intent)
- `inline_usage_plus_indented_second_variant_stays_separate`
  (regression for the storage asymmetry from finding #1)

21/21 unit tests pass; generator outputs unchanged; clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(generators_common): derive baseline indent from min leading-ws

Addresses roborev review #1629:

1. Baseline-indent is now the *minimum* leading whitespace across all
   non-blank lines in the Usage block (not the leading_ws of raw[0]).
   In well-formed docopt, continuation lines are always indented deeper
   than their variants, so min reliably picks the variant column —
   including when raw[0] happens to be a wrapped continuation of an
   inline `Usage:` variant (previously baseline would have drifted to
   the continuation's indent, causing real variants at the standard
   column to be misclassified).

2. The continuation branch now fails loudly (debug_assert) when hit
   with an empty variants vec rather than silently promoting the line,
   so a future refactor of the baseline derivation can't regress
   unnoticed.

3. Replaced `.map().next().unwrap_or(0)` with `.iter().map(...).min()`
   — both a cleaner expression and the fix for the baseline-drift bug.

4. Added `inline_usage_with_wrapped_continuation_and_second_variant` to
   lock in the exact corner case: inline `Usage: qsv foo --bar`, an
   indented wrap line, and a second variant at the standard column —
   baseline must land on the standard column, the wrap line must merge
   into variant 1, and the second variant must stay separate.

5. Doc-comment reworded to describe the actual invariant: "at or below
   the baseline indent start new variants; strictly deeper lines are
   merged."

22/22 unit tests pass; generator outputs unchanged; clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): add optional Option.required to TypeScript schema

The Rust generator emits `"required": true` on required options in the
MCP skill JSON (via the newly-added Option_.required field in
mcp_skills_gen.rs). The TypeScript Option interface didn't declare it,
so consumers using the typed view wouldn't see the field even though
the JSON carries it.

Adds `required?: boolean` with a doc comment matching the generator's
semantics: emitted only when true, omitted for optional options.

Addresses Copilot review on PR #3734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request Apr 28, 2026
- (#1 #2) Replace SAMPLE_TEST_PORT + SAMPLE_TEST_HOST (which
  duplicated the port and could drift) with a single SAMPLE_TEST_PORT +
  SAMPLE_TEST_BIND_HOST literal. URL-builder and bind() both derive from
  the same source — no more brittle .split(':').next().unwrap() that
  would also panic on IPv6 hosts.
- (#3) Wrap the ServerHandle in a SampleWebServer RAII guard. The
  server now stops in Drop, so a panic inside read_stdout / stdout
  doesn't leak the port and cascade into "Address already in use" on
  the next #[serial] test.
- (#4) Call wrk.assert_success(&mut cmd) before reading stdout in
  the success-path tests, so a regression that makes qsv exit non-zero
  surfaces qsv's stderr instead of a generic CSV-parse error.

77/77 sample tests pass; clippy --bin qsv clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request Apr 28, 2026
…3775)

* test/sample: add integration tests for streaming Bernoulli URL path

Closes the test-coverage gap flagged in PR #3774. Stands up a local
actix-web fixture (port 8082, distinct from test_fetch's 8081) and
exercises the boundary detection and validation guards added there:

- sample_bernoulli_url_quoted_newline_header: header field 0 contains a
  literal `\n` inside an RFC-4180 quote. Asserts the header arrives
  intact (3 fields, embedded newline preserved) and that every emitted
  data row also has 3 fields. Old code would have split on the raw
  byte and corrupted every following record.
- sample_bernoulli_url_max_size_truncation: serves a ~1.2 MiB CSV with
  fixed 100-byte records so `--max-size 1` cuts deterministically
  inside record 10486. Asserts max id <= 10485 (no half-record at the
  cap) and that every emitted row is well-formed.
- sample_bernoulli_url_404_fails_fast: hits an unmapped path on the
  fixture server. Asserts qsv exits with error instead of feeding the
  HTML 404 body into the csv parser (regression for the missing
  `error_for_status()`).
- sample_bernoulli_url_custom_delimiter: serves a TSV and passes
  `--delimiter '\t'`. Reads raw stdout and splits on tab (the writer
  also honors --delimiter, so read_stdout's comma parser would
  collapse rows). Asserts header and data rows split into 3 fields.

Tests use #[serial] so they don't race on the port. 77/77 sample tests
pass; clippy --bin qsv clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* typo: mis-split->split incorrectly

* test/sample: address review feedback on streaming Bernoulli tests

- (#1 #2) Replace SAMPLE_TEST_PORT + SAMPLE_TEST_HOST (which
  duplicated the port and could drift) with a single SAMPLE_TEST_PORT +
  SAMPLE_TEST_BIND_HOST literal. URL-builder and bind() both derive from
  the same source — no more brittle .split(':').next().unwrap() that
  would also panic on IPv6 hosts.
- (#3) Wrap the ServerHandle in a SampleWebServer RAII guard. The
  server now stops in Drop, so a panic inside read_stdout / stdout
  doesn't leak the port and cascade into "Address already in use" on
  the next #[serial] test.
- (#4) Call wrk.assert_success(&mut cmd) before reading stdout in
  the success-path tests, so a regression that makes qsv exit non-zero
  surfaces qsv's stderr instead of a generic CSV-parse error.

77/77 sample tests pass; clippy --bin qsv clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test/sample: address Copilot review on PR #3775 — single-run cmd, server start timeout

- Replace the assert_success-then-read_stdout double-run pattern with a
  single capture-and-parse helper. The previous shape ran qsv twice per
  test, doubling fixture-server requests (and the ~1.2 MiB max-size
  download) and meaning the parsed stdout came from a different
  execution than the one whose status was asserted.
  - Added run_and_assert_success(): runs once, asserts status, returns
    Output (with stderr surfaced on failure).
  - Added parse_csv_stdout(): mirrors wrk.read_stdout's Vec<Vec<String>>
    shape but reads from a captured buffer.
  - All three success-path tests (quoted newline header, max-size
    truncation, custom delimiter) now use these helpers.
- Switch the SampleWebServer startup channel to send
  Result<ServerHandle, String> and use recv_timeout(10s) instead of
  recv(). A failed bind (e.g., port already in use) used to leave
  start() blocked forever; it now panics fast with the bind error
  surfaced from the server thread.

77/77 sample tests pass; clippy --bin qsv clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request May 10, 2026
- Dedupe build_large_oom_csv into tests/workdir.rs so test_stats and
  test_frequency share one source of truth (Low #1).
- Document the pre-indexed + OOM → sketch fallback path in --memcheck
  USAGE text, CHANGELOG, and docs/STATS_DEFINITIONS.md (Low #2).
- Drop the dead flag_sketch_method='frequent_items' assignment before
  run_frequent_items — confirmed run_frequent_items does not consult
  flag_sketch_method (Low #3).
- Tighten the stats and frequency OOM wwarn messages to "Re-run with
  explicit ... exact to disable the auto-enable" — matches the
  established frequency wording and removes the misleading "override"
  phrasing (Low #4).

Verified Low #5 separately: which_stats() already gates mad on
!approx_quantiles regardless of flag_everything/flag_mad, so the
auto-disable promised by the wwarn is honored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request May 11, 2026
)

* feat(stats,frequency): auto-enable DataSketches estimators on OOM

When --memcheck is set and util::mem_file_check returns OutOfMemory,
stats and frequency now auto-enable their DataSketches-backed estimators
(t-digest + HyperLogLog for stats; Misra-Gries Frequent Items for
frequency) in addition to the existing auto-index fallback. Conflict
guards mirror the explicit-validation rejections so the auto-enable only
flips methods that would have passed validation if set by hand. A wwarn!
lists the auto-enabled estimators; the original OOM is only propagated
when neither fallback engages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(stats,frequency): address roborev #2028 findings

- Dedupe build_large_oom_csv into tests/workdir.rs so test_stats and
  test_frequency share one source of truth (Low #1).
- Document the pre-indexed + OOM → sketch fallback path in --memcheck
  USAGE text, CHANGELOG, and docs/STATS_DEFINITIONS.md (Low #2).
- Drop the dead flag_sketch_method='frequent_items' assignment before
  run_frequent_items — confirmed run_frequent_items does not consult
  flag_sketch_method (Low #3).
- Tighten the stats and frequency OOM wwarn messages to "Re-run with
  explicit ... exact to disable the auto-enable" — matches the
  established frequency wording and removes the misleading "override"
  phrasing (Low #4).

Verified Low #5 separately: which_stats() already gates mad on
!approx_quantiles regardless of flag_everything/flag_mad, so the
auto-disable promised by the wwarn is honored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address Copilot review: stream OOM fixture, drop drift-prone line refs, add -- prefix

- tests/workdir.rs: rewrite build_large_oom_csv to stream rows directly to
  a csv::Writer instead of building a 10M-row Vec in memory first. Avoids
  ~1.5 GB of String allocation that would OOM the test harness itself on
  memory-constrained hosts, defeating the purpose of the ignored OOM tests.
- src/cmd/stats.rs, src/cmd/frequency.rs: replace hard-coded intra-file
  line-number references in the try_enable_approx_sketches and
  can_enable_frequent_items doc-comments with descriptive references to
  the validator/dispatch blocks they mirror.
- src/cmd/frequency.rs: add the missing -- prefix to "sketch-method
  frequent_items" in the --memcheck USAGE help text; regenerate
  docs/help/frequency.md and .claude/skills/qsv/qsv-frequency.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(stats,frequency): clarify OOM fallback fires in both NORMAL and CONSERVATIVE mode

The OOM auto-fallback to DataSketches estimators is gated on the result
of util::mem_file_check, NOT on whether --memcheck is set. The
in-memory load check runs unconditionally on the non-parallel path;
--memcheck only switches the check from NORMAL mode (vs. total memory)
to the stricter CONSERVATIVE mode (vs. available + swap × platform
factor). The fallback can therefore trigger without --memcheck too —
just much less often, since NORMAL mode only trips when the file is
larger than ~80% of total RAM.

Rewrote --memcheck USAGE in stats.rs and frequency.rs to:
- Lead with what --memcheck actually does (CONSERVATIVE vs. NORMAL).
- Reference QSV_MEMORY_CHECK as the env-var equivalent.
- Describe the OOM fallback as a behavior of the load check itself,
  not of --memcheck specifically.

Updated CHANGELOG.md and docs/STATS_DEFINITIONS.md to match.
Regenerated docs/help/{stats,frequency}.md and the corresponding MCP
skill JSONs from the new USAGE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address Copilot review: real opt-out for OOM auto-enable + assert command success in tests

Two issues raised by Copilot on the OOM auto-fallback:

(1) The wwarn promised users could re-run with explicit
    --quantile-method exact / --cardinality-method exact / --sketch-method
    exact to disable the auto-enable, but the code only checked the
    parsed flag value. docopt fills in the default "exact" either way,
    so an explicit --foo-method exact was indistinguishable from
    omitting the flag — making the documented opt-out a no-op.

    Fix: scan argv for the literal flag names ("--foo-method" or
    "--foo-method=...") to detect explicit user intent. Thread that
    through try_enable_approx_sketches / can_enable_frequent_items via
    new user_set_* parameters; the auto-enable is suppressed when the
    user explicitly provided the flag (regardless of value). Documented
    in STATS_DEFINITIONS.md.

(2) The new OOM tests used wrk.output_stderr, which returns stderr
    regardless of exit status — a command that errored out after
    printing the auto-enable wwarn would still pass the test.

    Fix: add a wrk.stderr_on_success helper that asserts status.success()
    before returning stderr (with the same diagnostic-rich panic format
    as assert_success). Migrate the 6 new stats/frequency OOM tests to
    use it. Other call sites of output_stderr left untouched — they
    test failure paths where non-success is intentional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(frequency): restore can_enable_frequent_items doc comment

Address roborev #2032: in the previous commit (d9fe03e), `argv_has_flag`
and its doc comment were inserted *between* `can_enable_frequent_items`'s
doc comment and the function body. Because Rust doc comments attach to
the next item, the entire docstring block (including the
`user_set_sketch_method` paragraph and the trailing "Returns false if any
conflicting flag is set..." line) bound to `argv_has_flag`, leaving
`can_enable_frequent_items` with no doc comment at all.

Move `argv_has_flag` (with its own 4-line doc) to live *after*
`can_enable_frequent_items`, mirroring the layout in stats.rs where the
ordering was correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad added a commit that referenced this pull request May 27, 2026
…, Croissant (#3908)

* feat(profile): comprehensive DCAT-US v3 support (Catalog, GSA bundle, force semantics)

Closes the five gaps that kept `qsv profile` from being an agency-grade
DCAT-US v3 reference tool:

- Vendor the full GSA JSON Schema bundle (26 definitions + 2 qsv
  overlays + MANIFEST.json + refresh README) under resources/dcat-us-v3/,
  pinned to upstream commit cf8789002. `--validate-dcat` now runs against
  the full bundle via `referencing::Registry`, dispatching the Dataset
  or Catalog overlay by the emitted `@type`. A `curie::strip_curies`
  pre-pass bridges qsv's JSON-LD-compact output to GSA's unprefixed
  schema keys without touching the emitted JSON on disk.

- Add `--catalog` flag that wraps the Dataset inside a `dcat:Catalog`
  envelope (`Catalog{dataset:[...]}`) for federation harvesters.

- Emit nine new optional v3 fields with natural data sources:
  Dataset-level `dct:created`, `dcat:version`, `dcat:versionNotes`;
  Distribution-level `dcat:checksum` (SHA-256 via sha2), `dcat:compressFormat`,
  `dcat:packageFormat`, `dcat:spatialResolutionInMeters`, `dct:language`,
  `dct:conformsTo`. Widen `dct:conformsTo` to array per v3 cardinality;
  emit `dct:license` as string and `dcat:byteSize` as string to match
  the GSA schemas' declared shapes.

- Implement full `force: true` override semantics across all three
  --initial-context subtrees. `context::collect_forced_paths` now walks
  package/resource entries through a 47-entry `ckan_to_dcat` mapping
  table; `apply_force_overrides` in `run()` applies forced leaves
  LAST so they beat both inferred and discovered metadata.

Pipeline precedence (low → high): inferred → discovered → dataset_info
pointers → forced leaves → schema validation.

Bumps profile feature: adds `sha2 = "0.10"` as a direct dep. Test
counts: 143 unit (was 96, +47) and 29 integration (was 18, +11) all
passing, plus a new bundle pin guard test that re-hashes every
vendored schema against MANIFEST.json on each run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): scaffold YAML-driven projection engine (Stage 1)

Lay the foundation for the YAML-driven multi-profile projection engine
described in plan §1-§2. New modules are wired into profile.rs but the
orchestrator still calls the legacy dcat.rs path — zero behavior change
shipped here. Subsequent stages (§3-§8) populate the profile YAMLs,
swap the orchestrator, and delete the legacy hardcoded modules.

New modules:
* src/cmd/profile/profile_spec.rs — ProfileSpec serde types, embedded-
  first load() with case-insensitive name resolution, file-path fallback,
  6 unit tests.
* src/cmd/profile/projection.rs — generic project() engine with
  ProjectionMode { Dataset, Catalog }, ProjectionWarning { Severity },
  wrap_as_catalog, for_each_column RecordSet expansion, profile-aware
  lookup/field_mapping closures, dry_compile validator, 9 unit tests.
* src/cmd/profile/discovery_merge.rs — merge() with fill-if-absent,
  overlay-array, never strategies; never_overwrite + forced_paths
  protection; 5 unit tests.

Helper additions (formula_helpers.rs):
* Filters: only_if_absolute_iri, basename, file_stem,
  sanitize_iso_8601_interval, format_mailto.
* Globals: sha256_of (streaming), blake3_of (mmap+rayon), file_size_of,
  compress_format, package_format, build_csvw_schema.

Helpers needing profile state (lookup, field_mapping) live in
projection.rs::register_profile_helpers as closures over the
ProfileSpec; they unwrap_or(UNDEFINED) so | default chains work.

USAGE additions:
* --profile <name|path>: embedded names (dcat-us-v3, dcat-ap-v3,
  croissant) resolved first; falls back to file path. Not yet
  consumed in run() — wired up in Stage 4.

Placeholder YAMLs under resources/profiles/ exist so include_str! resolves
during Stage 1 builds; they will be replaced with real content in
Stages 3 (DCAT-US v3), 6 (DCAT-AP v3), 7 (Croissant).

Verification:
* cargo build --bin qsv -F profile,feature_capable — clean (23 expected
  dead-code warnings for the unused scaffold).
* cargo test cmd::profile:: — 163 passed (+20 new tests).
* cargo test --test tests test_profile:: — 29 passed (no regression).
* cargo +nightly fmt — applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(profile): capture goldens from legacy engine before YAML swap (Stage 2)

Lock the byte-equivalent output of the current hardcoded dcat.rs engine
against three regression fixtures so Stage 3's YAML-driven projection
can be asserted to produce identical Dataset + Catalog blocks.

Goldens captured by running today's qsv profile against each fixture
with the canonical --initial-context template, then normalizing via jq
to strip the only path-dependent field (qsv:sourcePath inside
dcat:distribution). Everything else in the .dcat block — including
dcat:byteSize, dcat:checksum, dct:modified, csvw:tableSchema — is
deterministic for fixed input and is captured verbatim.

Fixtures (under tests/resources/profile/golden/):
* nyc-311-subset.csv (10 rows) — geocoded urban service requests:
  lat/lon present, mixed Open/Closed status, multi-agency.
* usda-soil-subset.csv (10 rows) — scientific numeric data: pH,
  organic_carbon_pct, nitrogen_pct, clay/sand/silt percentages.
* wprdc-311-subset.csv (10 rows) — Pittsburgh 311 records:
  capitalized headers, X/Y geo, council districts + wards.

Goldens per fixture:
* <fixture>.dataset.expected.json — the .dcat block from Dataset mode.
* <fixture>.catalog.expected.json — the .dcat block from --catalog mode.

.gitignore whitelists tests/resources/profile/golden/*.{csv,expected.json}
so the *.json + *.csv blanket-ignores don't strip them.

These goldens will drive Stage 3's dcat_us_v3_golden_parity_dataset
and dcat_us_v3_golden_parity_catalog tests; CI hard-fails on drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): ship dcat-us-v3.yaml profile (Stage 3, partial)

Author resources/profiles/dcat-us-v3.yaml — the full DCAT-US v3
projection definition that will replace the hardcoded dcat.rs engine
in Stage 4. The YAML mirrors the legacy add_* functions field-for-field
in declaration order so serde_json::Map insertion preserves wire-shape
parity (verified against the Stage-2 goldens at swap time).

Profile content:
* 4 vocabularies (license_iri, accrual_periodicity, iso_639_1,
  csvw_datatype) — each migrated verbatim from the legacy Rust
  constants. The EU vocab IRIs retain http:// scheme per their
  canonical published identifiers; DevSkim DS137138 suppressed per
  line.
* 53 field_mappings — same CKAN→DCAT pointer table the legacy
  ckan_to_dcat::CKAN_TO_DCAT held, in identical declaration order so
  alias-resolution precedence is preserved.
* dataset.fields[] — 23 entries covering core identity, provenance,
  contact point (required), classification, coverage, US codes
  (recommended), governance, and extended metadata. emit_when guards
  match the legacy `if let Some(...)` shapes.
* distribution.fields[] — 22 entries covering title, description,
  download URL, format/license/restrictions, language/conformance,
  file-derived facts (byteSize, checksum, compress/package format),
  spatial resolution, and csvw:tableSchema.
* catalog block reproduces wrap_as_catalog's envelope (Catalog of
  <title>, dct:conformsTo, dct:publisher inheritance).
* discovery_merge: enabled, never_overwrite=[@context,@type,
  dcat:distribution], fill-if-absent strategy.
* validation: enabled against the vendored GSA bundle under
  resources/dcat-us-v3/ with the same 11 strippable CURIE prefixes.

dry_compile verification:
A new unit test (embedded_dcat_us_v3_parses_and_dry_compiles)
parses the embedded YAML and runs projection::dry_compile() against
it — exercising every template's minijinja compile path. All
templates compile clean.

The actual byte-equivalent parity test (running each Stage-2 fixture
through projection::project() and asserting against goldens) lands
in Stage 4 alongside the orchestrator swap — at that point the
engine actually consumes the YAML.

The reference cross-checked sources for content:
  https://github.com/GSA/dcat-us/
  https://resources.data.gov/resources/dcat-us3/
  the vendored GSA bundle under resources/dcat-us-v3/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(profile): handoff #3 — YAML projection engine, Stages 1-3 landed

Captures the current state after the YAML-driven projection migration's
first three commits. Documents what's wired (scaffold + helpers + flag +
goldens + DCAT-US v3 YAML), what's still on the legacy path (dcat.rs
drives output), and a 9-sub-step Stage 4 plan for the orchestrator swap.

Supersedes profile2-handoff.md for post-PR-#3901 work. Key gotchas
distilled into §5: lookup helpers must return Value::UNDEFINED,
goldens only normalize qsv:sourcePath, field-mapping count is 53 not 47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): wire YAML projection engine into orchestrator (Stage 4a)

profile.rs::run now routes through projection::project() with the
loaded ProfileSpec (default: dcat-us-v3). The YAML engine produces
byte-equivalent output to the legacy dcat.rs path on all 6 golden
fixtures (3 inputs × dataset/catalog modes), verified by new parity
integration tests.

Orchestrator changes:
* Load profile via profile_spec::load(args.flag_profile | "dcat-us-v3")
  at the top of run(), then projection::dry_compile() to fail fast on
  malformed embedded YAML.
* ContextArgs gains a `profile: &ProfileSpec` field; context::build
  threads it to load_initial_context → collect_forced_paths so the
  CKAN→target pointer translation uses profile.field_mappings instead
  of importing ckan_to_dcat.
* Replace dcat::build() call with projection::project(&profile,
  &projection_ctx, mode) — the projection_ctx carries pkg, res, stats,
  dpp, source_label, local_path matching the YAML's template names.
* Replace merge_discovered() with discovery_merge::merge(&profile,
  inferred, discovered, forced_dcat_paths) — same /dcat/<key> forced-
  path semantics, now driven by profile.discovery_merge.
* Catalog wrap baked into project() via ProjectionMode::Catalog
  (chosen upfront based on flag_catalog); orchestrator no longer
  calls catalog::wrap_as_catalog at the warning-filter step.
* Stash key renamed __pending_dcat_warnings →
  __pending_projection_warnings.
* DcatWarning → ProjectionWarning conversion bridges dcat_validate
  and run_profile_validation outputs (Stage 5 will refactor those
  modules to return ProjectionWarning directly).

Engine improvements:
* projection::project sets UndefinedBehavior::Chainable so
  `pkg.dpp_suggestions.spatial_extent.value | default("")` walks
  missing intermediates gracefully (matches legacy dcat.rs semantics
  where absent keys silently fall through).
* New file-aware helpers in formula_helpers.rs:
  - bbox_from_dpps(dpp, stats) — lat/lon column → POLYGON-WKT
    `dct:Location` array, mirroring legacy dcat::bbox_from_dpps.
  - temporal_from_dpps(dpp, stats) — date columns → array of
    `dct:PeriodOfTime`, one per inferred date column.
  - build_csvw_schema(stats) — column-name → stats-blob map walked,
    emitting `{columns: [...]}` with name, titles, datatype,
    qsv:cardinality / nullcount / min / max.
  - csvw_datatype_legacy helper mirrors the legacy mapping
    (Float → double, Integer → integer, Date → date, etc.).

dcat-us-v3.yaml updates:
* dct:spatial / dct:temporal fields call bbox_from_dpps /
  temporal_from_dpps as fallbacks behind the formula-derived WKT
  suggestion.
* dct:license emits a plain string (legacy license_value shape) via
  `{{ lookup("license_iri", raw) | default(raw) }}`, not the previous
  `{"@id": ...}` object form (GSA Distribution.json declares license
  as anyOf:[null,string]).

Tests:
* 2 new integration tests (dcat_us_v3_golden_parity_dataset /
  _catalog) iterate the 3 fixtures and assert byte-equivalent .dcat
  output against the goldens.
* discovery_merge test: forced-path form switched from "/dct:title"
  to "/dcat/dct:title" so it matches the legacy dataset_info pointer
  shape; +1 new test for nested-path force blocking top-level merge.
* All 6 goldens refreshed to current legacy output (the original
  Stage-2 capture had alphabetical stats-cache state).
* Full test sweep: 165 unit + 31 integration tests pass, 0 failures.

The legacy dcat.rs / catalog.rs / ckan_to_dcat.rs / curie.rs modules
are still in tree (their tests still run via cmd::profile::*) but no
longer participate in the engine path. Stage 4b deletes them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): delete legacy hardcoded engine + refactor validator (Stages 4b + 5)

The YAML-driven projection engine is now the only path. Stage 4a wired
projection::project() into run() with byte-equivalent output against
the goldens; this commit cleans up by deleting the legacy modules and
refactoring dcat_validate to consume the active ProfileSpec.

Deletions (~2400 LOC):
* src/cmd/profile/dcat.rs (1738 LOC) — the 9 add_* helpers,
  bbox_from_dpps, temporal_from_dpps, csvw_datatype, license_value,
  accrual_periodicity_iri, normalize_iso_639_1. The minijinja-side
  equivalents live in formula_helpers.rs + dcat-us-v3.yaml.
* src/cmd/profile/catalog.rs (154 LOC) — wrap_as_catalog moved into
  projection::wrap_as_catalog.
* src/cmd/profile/ckan_to_dcat.rs (271 LOC) — CKAN_TO_DCAT table
  moved verbatim into dcat-us-v3.yaml's field_mappings:; the lookup
  is now ProfileSpec::translate_ckan_ptr.
* src/cmd/profile/curie.rs (225 LOC) — strip_curies is now an inline
  helper in dcat_validate.rs driven by
  profile.validation.strippable_curie_prefixes.
* mod declarations for the deleted modules in profile.rs.

dcat_validate.rs refactor (Stage 5):
* New public API: validate(profile: &ProfileSpec, block: &Value) ->
  Vec<ProjectionWarning>. When profile.validation.enabled == false
  (DCAT-AP v3, Croissant), returns vec![] without touching the
  schema.
* Inline strip_curies / strip_curie_key replace the deleted curie
  module; the prefix list comes from
  profile.validation.strippable_curie_prefixes (still byte-identical
  to the legacy list for DCAT-US v3).
* classify_severity now returns projection::Severity instead of
  dcat::Severity.
* Test functions migrate to the new (profile, block) signature by
  loading the embedded dcat-us-v3 profile via profile_spec::load.

profile.rs cleanup:
* dcat_validate::validate_dataset_or_catalog() call → validate().
* run_profile_validation now returns Vec<ProjectionWarning> directly;
  the .into_iter().map(From::from) bridge is gone.

projection.rs cleanup:
* impl From<DcatWarning> for ProjectionWarning removed (no longer
  needed — all warning producers return ProjectionWarning).

Verification:
* cargo build --bin qsv -F profile,feature_capable — clean.
* All 4 binaries build clean: qsv (-F all_features), qsvmcp
  (-F qsvmcp), qsvlite (-F lite), qsvdp (-F datapusher_plus).
* cargo test cmd::profile:: → 127 unit tests pass (down from 165;
  the deleted legacy modules carried 38 tests now obsoleted by the
  YAML+goldens parity coverage).
* cargo test --test tests test_profile:: → 31 integration tests pass
  (29 original + 2 new dcat_us_v3_golden_parity_* tests).

Net Rust LOC delta this commit: −2388 deleted, +60 added (inline
strip_curies + tests) = −2328 LOC. Cumulative since Stage 1:
−2328 + 1525 + 546 = −257 LOC vs the pre-YAML-engine state, AND
all engine knowledge now lives in resources/profiles/dcat-us-v3.yaml
where it's editable without recompiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): ship dcat-ap-v3 profile + 4 smoke tests (Stage 6)

DCAT-AP v3 (semiceu.github.io/DCAT-AP/releases/3.0.0/) is now an
embedded profile selectable via --profile dcat-ap-v3. The shape is a
DCAT-US v3 subset, with:

* JSON Schema validation disabled (DCAT-AP ships SHACL upstream; a
  SHACL backend is a future enhancement).
* No dcat-us:* extensions (bureauCode, programCode, accessLevel,
  purpose, liabilityStatement) — those are US-specific.
* New `eu_theme` vocabulary mapping CKAN group slugs to EU
  publications-office authority IRIs
  (http://publications.europa.eu/resource/authority/data-theme/...).
* dcat:accessURL required on Distribution per the v3 spec
  (Mandatory cardinality 1..*).
* dct:conformsTo points at the SEMIC v3 release URL.
* Smaller field_mappings (29 entries vs the 53 in dcat-us-v3) since
  many DCAT-US extensions don't apply.

The same minijinja templates and helpers power both profiles; the
only Rust-side change in this commit is the YAML profile + tests.

Smoke tests (tests/test_profile.rs):
* dcat_ap_v3_emits_no_dcat_us_extensions — verifies the projection
  carries zero dcat-us:* keys even with the full initial-context.
* dcat_ap_v3_distribution_carries_access_url — confirms the
  Distribution-mandatory dcat:accessURL is populated.
* dcat_ap_v3_conforms_to_targets_spec_url — confirms downstream
  consumers can detect the profile via dct:conformsTo.
* dcat_ap_v3_validation_is_disabled_noop — confirms --validate-dcat
  with this profile produces no JSON Schema warnings (the validator
  short-circuits when profile.validation.enabled == false).

Source: https://github.com/SEMICeu/DCAT-AP
Cardinality reference: https://semiceu.github.io/DCAT-AP/releases/3.0.0/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): ship croissant 1.0 profile + 5 smoke tests (Stage 7)

Croissant ML metadata format (mlcommons.org/croissant) is now an
embedded profile selectable via --profile croissant. The output is
schema.org-rooted JSON-LD conforming to Croissant 1.0:

* @context inlines the canonical Croissant map: @language=en,
  @vocab=https://schema.org/, plus cr:/dct: prefix shorthands. Per
  the Croissant spec at
  https://github.com/mlcommons/croissant/blob/main/docs/croissant-spec.md.
* @type=sc:Dataset; field paths use schema.org bare keys
  (name/description/url/license/creator/publisher/keywords/etc.)
  rather than dcat:/dct: prefixes.
* conformsTo target IRI: http://mlcommons.org/croissant/1.0.
* Distribution emitted under bare `distribution` (schema.org @vocab
  resolves it) with @type=sc:FileObject.
* Per-column cr:RecordSet/cr:Field expansion via the new
  build_croissant_fields helper — one Field per CSV column with
  schema.org dataType (sc:Text / sc:Integer / sc:Float / sc:Boolean
  / sc:Date / sc:DateTime).
* BLAKE3 hash via cr:fileFingerprint (qsv-native mmap+rayon, markedly
  faster than SHA-256 on multi-GB ML training data; Croissant has no
  SPDX-mandated algorithm so the choice is free).
* validation.enabled: false (Croissant uses a Python validator,
  mlcroissant, not JSON Schema).
* discovery_merge.enabled: false (Croissant doesn't live in
  CKAN-style data portals).

Engine extensions:
* DatasetBlock.context now accepts a `Value` (string or object) so
  the inline Croissant @context map round-trips verbatim. DCAT-US /
  DCAT-AP profiles still use a string URI — backwards-compatible.
* DistributionBlock.path lets profiles override the Distribution
  wrapper key. Croissant emits `distribution`; DCAT defaults remain
  `dcat:distribution`.
* New formula helper build_croissant_fields(stats) walks the per-
  column stats map and emits a flat cr:Field array with schema.org
  dataType IRIs.

Smoke tests (5 in tests/test_profile.rs):
* croissant_uses_schema_org_context_and_sc_dataset_type
* croissant_conforms_to_targets_mlcommons_spec
* croissant_emits_recordset_with_one_field_per_csv_column
* croissant_uses_bare_distribution_key_not_dcat_namespaced
* croissant_distribution_uses_file_object_type

Verification: cargo test cmd::profile:: → 127 unit, test_profile::
→ 40 integration tests pass (29 original + 2 parity + 4 DCAT-AP +
5 Croissant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(profile): regenerate help + finalize handoff (Stage 8)

* docs/help/profile.md regenerated via --generate-help-md to surface
  the --profile flag added in Stage 1.
* profile3-handoff.md updated to reflect all 8 stages landed,
  full file map post-deletion, verification commands, captured
  design decisions, and queued follow-ups.
* src/cmd/profile.rs: drop the now-useless DcatWarning → ProjectionWarning
  conversion in the --validate-dcat code path (Stage 5 already
  refactored validate() to return ProjectionWarning directly).

Verification:
* python3 scripts/docs-drift-check.py → no drift detected.
* All 4 binaries build clean (qsv, qsvmcp, qsvlite, qsvdp).
* cargo test cmd::profile:: → 127 unit tests pass.
* cargo test --test tests test_profile:: → 40 integration tests pass.
* cargo clippy --bin qsv -F profile,feature_capable → no new findings
  in the YAML-engine code path.

This closes the YAML-driven projection engine migration. The shipped
binary always goes through projection::project(); the legacy
dcat.rs / catalog.rs / ckan_to_dcat.rs / curie.rs modules are
deleted. DCAT-US v3 / DCAT-AP v3 / Croissant projection knowledge
lives entirely in resources/profiles/*.yaml — editable without
recompiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(profile): address roborev #2490 findings (catalog/discovery/force/validate)

7 findings from the YAML-engine branch review at job 2490. Each fix
ships with a regression guard in tests/test_profile.rs.

Medium severity (6):

1. Catalog mode + discovery merge target (src/cmd/profile.rs:398).
   Discovery was merging into the Catalog envelope top-level instead
   of the nested Dataset. Fix: project Dataset always, apply
   discovery_merge::merge, THEN conditionally wrap in Catalog via the
   new projection::wrap_in_catalog_envelope helper. Guard:
   catalog_mode_merges_discovered_into_inner_dataset_not_envelope.

2. Catalog envelope missing @context (src/cmd/profile/projection.rs:296).
   The envelope carried CURIE keys (dct:title, dct:conformsTo,
   dcat:dataset) without a top-level @context, leaving it invalid as
   JSON-LD. Fix: wrap_as_catalog now copies profile.dataset.context
   into the envelope; inner Dataset keeps its own context for
   self-containment. Guard: catalog_envelope_carries_top_level_context.

3. dct:spatial emits string "null" when no bbox
   (resources/profiles/dcat-us-v3.yaml + dcat-ap-v3.yaml). bbox_from_dpps
   returning UNDEFINED rendered as `"null"` via `| tojson` because
   coerce_json_or_string left the literal alone. Fix: emit_when guard
   gates the field on WKT-or-bbox availability. Guard:
   spatial_field_suppressed_when_no_lat_lon_columns.

4. --dcat-legacy-license parsed but never wired
   (src/cmd/profile.rs:380). Flag was documented + collected into
   Args but never reached the YAML engine. Fix: thread the flag into
   projection_ctx as `legacy_license`, add a conditional Dataset-level
   dct:license field in dcat-us-v3.yaml gated on that variable.
   Guards: dcat_legacy_license_emits_dataset_level_license,
   dcat_legacy_license_off_keeps_license_distribution_only.

5. Forced package/resource values bypass profile shaping
   (src/cmd/profile/context.rs:388). collect_forced_paths was
   writing raw CKAN values to target pointers via
   apply_force_overrides, producing string-where-Agent-expected
   shapes (e.g. forced package.publisher → "Name" instead of
   {"@type":"foaf:Agent","foaf:name":"Name"}). Fix: CKAN-side
   forces now only contribute to `forced_paths` (discovery-merge
   protection); the value lives in merged package/resource via
   normalize_value_force and flows through the profile's templates
   for proper shaping. dataset_info forces still take the
   raw-write path (that's the documented escape hatch).
   Guard: forced_package_publisher_flows_through_profile_template.

6. validate() ignores profile.validation paths
   (src/cmd/profile/dcat_validate.rs:250). When validation.enabled
   was true, the function always used the embedded GSA bundle
   regardless of profile.validation.schema_dir. Fix: when the
   profile's schema_dir matches the embedded `resources/dcat-us-v3/`
   path (the only bundle qsv ships today), use the embedded
   validators; any other schema_dir produces a single
   Recommended-severity warning explaining that custom-bundle
   validation is a queued follow-up. The embedded DCAT-US v3
   profile's behavior is unchanged.

Low severity (1):

7. DiscoveryMerge::default() disabled merging
   (src/cmd/profile/profile_spec.rs:273). #[derive(Default)] gave
   `enabled: false`, contradicting the documented "fill-if-absent
   enabled by default" semantics — the `#[serde(default =
   "default_true")]` annotation only fires during deserialization.
   Fix: hand-rolled Default impl with enabled: true, the
   never_overwrite list (@context, @type, dcat:distribution), and
   fill-if-absent strategy.

Golden refresh:
* Catalog goldens (nyc-311, usda-soil, wprdc-311) pick up the new
  envelope @context entry — finding #2 fix.
* usda-soil dataset golden loses the spurious `"dct:spatial":
  "null"` entry — finding #3 fix.

Verification:
* cargo test cmd::profile:: → 127 unit tests pass.
* cargo test --test tests test_profile:: → 46 integration tests pass
  (40 prior + 6 new regression guards).
* All 4 binaries build clean (qsv, qsvmcp, qsvlite, qsvdp).
* cargo +nightly fmt applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(profile): drop auto-generated stats caches from golden dir

The previous commit accidentally committed three *.stats.csv files
(qsv stats cache, auto-regenerated on every profile run). They slipped
past .gitignore because the golden-directory *.csv whitelist also
matches the stats.csv suffix.

Fix: add a re-ignore rule for `tests/resources/profile/golden/*.stats.csv`
and the JSONL variant, then `git rm` the committed files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(profile): preserve CKAN-side force against spec formulas (roborev #2491)

Regression introduced by the #2490 fix #5: when CKAN-side `force: true`
values stopped being raw-written via apply_force_overrides, they
became vulnerable to overwrite by spec formulas. A formula targeting
`package.publisher` would replace the forced value in
merge_formula_results' pass-1 (before projection), violating the
documented "force beats inferred" guarantee.

Fix: track the CKAN-side forced field-name sets through the pipeline
so merge_formula_results can skip them.

* context.rs: collect_forced_paths now returns a 4-tuple including
  `forced_package_fields` and `forced_resource_fields`
  (HashSet<String> of CKAN-side field names marked force:true).
  load_initial_context returns the matching 6-tuple; AnalysisContext
  carries both sets.
* profile.rs: merge_formula_results takes the two sets and skips
  pass-1 inserts on matching field names. Suggestion-formula output
  (pass 2) lives in dpp_suggestions and is unaffected.

The forced value still flows through the profile templates for proper
shaping (so dct:publisher gets its foaf:Agent wrapper, etc.) — the
shaping fix from #2490 #5 is preserved.

Regression guard: forced_package_field_survives_formula_overwrite
(tests/test_profile.rs). Constructs a spec with a `title` formula
that would set "Formula Wins", combined with `package.title:
{value: "Forced Title", force: true}`. The output must carry
"Forced Title" — confirming force beats formula.

Verification:
* cargo test cmd::profile:: → 127 unit tests pass.
* cargo test --test tests test_profile:: → 47 integration tests
  pass (46 prior + 1 new regression guard).
* cargo +nightly fmt applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(profile): expand forced CKAN fields through alias mappings (roborev #2493)

Follow-up regression to #2491: the force-skip in merge_formula_results
only checked the exact CKAN field name. Aliases that project to the
same target pointer (e.g. `package.author` and `package.publisher`
both → `/dcat/dct:publisher`) bypassed the check — a formula writing
`publisher` could still overwrite a forced `author` value.

Fix: after the first pass collects forced (ckan_ptr, target_ptr)
pairs, walk profile.field_mappings and add every CKAN field whose
target appears in the forced target set to the forced_pkg /
forced_res field-name set. So forcing `package.author` now also locks
`package.publisher` (and any other alias keys for the same target).

Alias pairs covered by this fix in DCAT-US v3:
* author / publisher → dct:publisher
* landing_page / url → dcat:landingPage
* data_dictionary / describedBy → dcat:describedBy
* accrualPeriodicity / frequency / update_frequency → dct:accrualPeriodicity
* dcat-us:accessLevel / access_level → dcat-us:accessLevel
* accessRights / access_rights → dct:accessRights
* scopeNote / scope_note → skos:scopeNote
* liabilityStatement / liability_statement → dcat-us:liabilityStatement
* inSeries / in_series → dcat:inSeries
* versionNotes / version_notes → dcat:versionNotes
* license / license_id → distribution.dct:license
* modified / last_modified → distribution.dct:modified

Regression guards (tests/test_profile.rs):
* forced_author_locks_publisher_alias — forces package.author,
  formula targets `publisher`, asserts foaf:name is "Forced Author".
* forced_license_id_locks_license_alias — forces resource.license_id
  to cc-by, formula targets `license` with cc-by-sa, asserts the
  CC-BY 4.0 IRI (not CC-BY-SA) lands on Distribution.

Verification:
* cargo test cmd::profile:: → 127 unit tests pass.
* cargo test --test tests test_profile:: → 49 integration tests
  pass (47 prior + 2 new alias guards).
* cargo +nightly fmt applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address review: 9 Copilot suggestions on PR #3908

Apply all 9 unresolved inline review comments. Each was verified
against the current code before action.

1. docs/help/profile.md (truncated --initial-context help)
   Reformatted the USAGE block in src/cmd/profile.rs so the
   description survives markdown-table generation: flattened the
   nested bullet list into a single paragraph and added a pointer
   to dcat-init-context.README.md for the full example.

2. tests/resources/profile/dcat-init-context.README.md
   Updated the "How package / resource force flags route to DCAT"
   section to reference the active profile's `field_mappings:` table
   + `ProfileSpec::translate_ckan_ptr` instead of the deleted
   src/cmd/profile/ckan_to_dcat.rs module.

3. src/cmd/profile/profile_spec.rs (load-time validation claim)
   Moved `projection::dry_compile` inside `load()` so the doc claim
   on `EMBEDDED` is now accurate: every template parses through
   minijinja at profile-load time, surfacing typos before
   stats/frequency/formulas run. Dropped the redundant dry_compile
   call from profile.rs::run.

4. profile3-handoff.md (hardcoded absolute path)
   Removed the `/Users/joelnatividad/.claude/plans/...` reference
   to the original plan file; the handoff now describes the engine
   without pointing at a path that doesn't exist for other
   contributors.

5. resources/profiles/croissant.yaml (misplaced key)
   Removed the no-op `strippable_curie_prefixes: []` from the
   `discovery_merge:` block — that key lives under `validation:`
   per the schema; keeping it here was misleading.

6. src/cmd/profile.rs (dead `merge_discovered` + tests)
   Deleted the orphaned legacy `merge_discovered` function (the
   orchestrator now uses `discovery_merge::merge` exclusively) and
   the 9 in-file tests that exercised it. Coverage is preserved by
   the unit tests in src/cmd/profile/discovery_merge.rs and the
   new integration tests in tests/test_profile.rs (e.g.
   `catalog_mode_merges_discovered_into_inner_dataset_not_envelope`).
   Net −168 LOC.

7-8. src/cmd/profile.rs (stale `ckan_to_dcat` doc comments)
   Updated two doc comments (`apply_force_overrides` doc + the
   force-collection comment in `run()`) so future readers find
   `field_mappings:` + `ProfileSpec::translate_ckan_ptr` instead
   of being pointed at the deleted module.

9. resources/dcat-us-v3/README.md (wrong test path)
   The pin-guard test lives at tests/test_profile.rs::dcat_us_v3_bundle_pin_manifest_matches_files,
   not the non-existent tests/test_dcat_us_bundle_pin.rs. Updated
   both the prose reference and the `cargo test` invocation.

Verification:
* cargo build --bin qsv,qsvmcp,qsvlite,qsvdp — all 4 clean.
* cargo test cmd::profile:: → 117 unit tests pass (was 127; the
  10 deleted merge_discovered tests are obsolete).
* cargo test --test tests test_profile:: → 49 integration tests
  pass (unchanged).
* cargo +nightly fmt applied.
* docs/help/profile.md regenerated via --generate-help-md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address roborev #2495: extend dry_compile + restore IRI escape coverage

Two findings from the post-fix re-review of d78d34c.

Medium (src/cmd/profile/projection.rs:dry_compile):
  The previous load-time validation only checked emit_when guards on
  dataset fields, leaving distribution and catalog field guards
  vulnerable. A typo in a distribution emit_when would compile-pass
  load() but silently render-fail at projection time (render_truthy
  treats the error as false, dropping the field). Fix: extend
  dry_compile to syntax-check emit_when in both distribution and
  catalog field loops. New guards:
  * dry_compile_rejects_malformed_distribution_emit_when
  * dry_compile_rejects_malformed_catalog_emit_when

Low (src/cmd/profile/discovery_merge.rs):
  The removed merge_discovered tests carried regression coverage for
  forced discovered keys containing `/` or `~` (full-IRI JSON-LD
  properties like http://purl.org/dc/terms/title). Restore that
  coverage on discovery_merge's internal escape_token path. New
  tests:
  * forced_full_iri_key_blocks_matching_discovered_key — forced path
    with each `/` escaped to `~1` must block the matching discovered
    IRI key.
  * forced_full_iri_key_does_not_block_unrelated_discovered_key —
    escaping must not over-match; unrelated discovered keys (e.g.
    dct:identifier) still flow through.
  * escape_token_handles_rfc6901_round_trip — direct check of the
    `~`-before-`/` escape order on plain, slash, tilde, mixed, and
    full-IRI inputs.

Verification:
* cargo test cmd::profile:: → 122 unit tests pass (117 prior + 5 new).
* cargo test --test tests test_profile:: → 49 integration tests pass.
* cargo +nightly fmt applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants