build(deps): bump jsonschema from 0.28.1 to 0.28.2#2469
Merged
Conversation
Bumps [jsonschema](https://github.com/Stranger6667/jsonschema) from 0.28.1 to 0.28.2. - [Release notes](https://github.com/Stranger6667/jsonschema/releases) - [Changelog](https://github.com/Stranger6667/jsonschema/blob/master/CHANGELOG.md) - [Commits](Stranger6667/jsonschema@rust-v0.28.1...rust-v0.28.2) --- updated-dependencies: - dependency-name: jsonschema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
7 tasks
jqnatividad
added a commit
that referenced
this pull request
May 26, 2026
…ven validation, force:true, UUID URL-title walk-up, GSA-bundle deferral) (#3904) * feat(profile): walk past UUID-like basenames in url_title_default (§5.9) For CKAN-style `/datastore/dump/<uuid>` URLs the leaf basename is an opaque UUID — better than the random tempfile suffix but still not a usable title. Walk one level up (capped at 3) and return the first non-UUID-like segment we find. The classic CKAN dump URL now yields "dump" instead of a 36-char hex. New helper `is_uuid_like()` matches: - canonical 8-4-4-4-12 hex with dashes - compact 32 contiguous hex characters Both case-insensitive. Other ID-like patterns (MongoDB ObjectId at 24 hex, ULIDs, slugified IDs) are intentionally NOT matched — over-eager matching would walk past legitimate titles like "2024-Q3". Behavior: /datastore/dump/<uuid> -> "dump" (was: uuid) /path/snapshots/<32-hex> -> "snapshots" (was: hex) /datastore/dump/2024-Q3-payments.csv -> "2024-Q3-payments" (unchanged) /<uuid>/<uuid>/<uuid> -> leaf uuid (fallback after cap) 36-char non-hex string -> unchanged (length-collision check) If every candidate up the 3-level cap is UUID-like, falls back to the leaf UUID — still reproducible, still beats the tempfile suffix. Users wanting a prettier title supply `--initial-context.package.title`; a CKAN `/api/3/action/resource_show?id=<uuid>` lookup is a deferred follow-up. The previous `url_title_preserves_uuid_basename_unchanged` test documented the old behavior — replaced with four new tests covering the walk, the all-UUID fallback, the normal-basename regression check (including a 36-char non-hex length-collision case), and an `is_uuid_like` unit-level matrix of positives + negatives. Verified: 99 profile unit tests pass; 15 integration tests pass under both -F all_features and -F datapusher_plus. cargo +nightly fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(profile): add sibling-URL + JSON-LD DCAT discovery (§5.2) Wires two more mechanisms into dcat_discover::discover, chained in priority order after the existing Link: rel=describedBy probe: 2. Sibling URLs by convention (qsv profile follow-ups §5.2). Four candidates tried in order: - <url>.metadata.json (qsv profile's own output naming) - <url>.dcat.json (common DCAT-JSON convention) - <dirname>/datapackage.json (Frictionless Data Package spec) - <host>/.well-known/data.json (DCAT-US site catalog) 3. HTML JSON-LD <script type="application/ld+json"> blocks in the URL's parent (landing-page) HTML. Open-data portals typically host the dataset page one level above the raw CSV download. Implementation: - New `discover_via_sibling_urls` + `sibling_candidates` helper. Hand-rolled .metadata.json/.dcat.json suffixing preserves query strings (textual append); url::Url-based construction for the datapackage.json and /.well-known/data.json variants drops query & fragment since they're host-relative, not input-relative. - New `discover_via_html_jsonld` + `extract_jsonld_blocks` helper. Pure-string HTML scan (no parser dep): locate <script ...> tags, case-insensitive type-attribute check for application/ld+json, parse the body as JSON, run through extract_dcat_dataset (which already handles @graph envelopes + bare-object shape fallback). Skips response if neither Content-Type nor body sniff suggests HTML — avoids wasted scans on PDFs or binary blobs served with no Content-Type. - New `fetch_json_and_extract` shared GET helper, mirroring discover_via_link_header's 4 MiB body cap. Module doc comment updated: the §5.2 "follow-up" markers are replaced with the new active descriptions. Nine unit tests added (sibling_candidates × 3, extract_jsonld_blocks × 6) — pure-string, no network. Covers typical CSV URL, query+fragment stripping, host-only URLs, basic <script> match, mixed-case type attribute, walking past non-dataset blocks, no-match negative, unrelated <script> tags, and the @graph envelope variant. Verified: 108 profile unit tests pass (was 99, +9 new); 15 integration tests pass under both -F all_features and -F datapusher_plus. cargo +nightly fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(profile): run qsv validate when spec declares validators (§5.8) When the scheming spec declares one or more `validators` on any field (dataset_fields or resource_fields), invoke `qsv validate` against the input and merge any RFC4180 failures into dcat_warnings. The presence of validators is the trigger; their string content isn't interpreted yet — auto-generating a JSON Schema from declared types + CKAN validators is a future enhancement, but the architectural hook is in place. Implementation: - `Spec::has_validators()` walks both dataset_fields and resource_fields, returns true if any field's extras carry a non-empty, non-whitespace `validators` string. Whitespace-only entries are intentionally treated as "not declared" so empty but present entries don't accidentally trigger. - `run_profile_validation(input_path) -> Vec<DcatWarning>` spawns `qsv validate <input>` directly (not via util::run_qsv_cmd, which errors on non-zero exit — the validate path needs to succeed when the subprocess fails). Best-effort: spawn errors, missing binary, or non-UTF-8 stderr all silently degrade to "no warnings". Emits a `qsv profile: ran `validate`` status line on stderr, mirroring the existing `ran `frequency`` / `ran `count`` markers so the helper's invocation is observable. - Wired into the existing dcat_warnings merge block in profile.rs::run, alongside the build-time warning filter and --validate-dcat schema-violation path. Independent of --validate-dcat (which validates the emitted dcat block, not the input CSV). Failures land as DcatWarning entries with: field = "qsv:validation" severity = Required message = "input failed `qsv validate` (RFC4180): <detail>" Tests: - Four unit tests on Spec::has_validators: dataset-side trigger, resource-side trigger, none-declared negative, whitespace-only negative. - Two integration tests on the trigger plumbing: profile_runs_validation_when_spec_declares_validators (clean CSV + druf spec → validate spawns, no qsv:validation warning), profile_skips_validation_when_spec_has_no_validators (spec-less → validate must NOT spawn). Verified: 112 profile unit tests pass (was 108, +4); 17 integration tests pass under both -F all_features and -F datapusher_plus. cargo +nightly fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(profile): explain why GSA bundle vendoring is deferred (§5.3) The handoff suggested vendoring the full GSA dcat-us JSON Schema suite as a drop-in replacement for embedded_minimal_schema. While investigating, hit a fundamental shape mismatch the original plan didn't account for: the GSA bundle is written against the **unprefixed** JSON-LD-expanded form (`otherIdentifier`, `@type: "Dataset"`) while `dcat::build` emits the **prefixed JSON-LD-compact** form (`dct:identifier`, `@type: "dcat:Dataset"`). Naïvely vendoring the bundle and pointing the validator at it would flag every key as missing. Updated the dcat_validate module-level doc comment to spell out the three real paths forward (JSON-LD expansion, key translation layer, refactor dcat::build to emit expanded form) and why each is bigger scope than a vendor-and-swap. Embedded minimal schema stays in place — it catches the mandatory-field class of mistake cheaply. No code changes; doc-only commit so the next maintainer doesn't re-do the same investigation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(profile): honor force:true in dataset_info at merge time (§5.4) Wraps the existing `{value, force: true}` plumbing with real merge-time effect for `dataset_info` JSON-Pointer entries. Discovered DCAT (Link header / sibling URL / JSON-LD <script>) will no longer overlay paths the user marked forced — even when the inferred projection left them absent. Use case: declare a field "intentionally absent" and prevent publisher DCAT discovery from silently filling it in. Example: {"dataset_info": {"/dcat/dct:rights": {"value": null, "force": true}}} yields literal `null` at `/dcat/dct:rights` AND blocks any discovered `dct:rights` from being merged in. Implementation: - `collect_forced_dataset_info_paths(raw)` walks the `dataset_info` subtree BEFORE `normalize_value_force` strips the wrappers and collects pointer paths whose value matched the exact two-key `{"value": ..., "force": true}` shape. `force: false` and plain values aren't collected. - `load_initial_context` signature extended: returns `(package, resource, dataset_info, forced_dcat_paths)`. The previous wrapper-stripping behavior is unchanged. - `AnalysisContext` gains `forced_dcat_paths: Vec<String>` so the orchestrator can hand it to `merge_discovered`. - `merge_discovered(inferred, discovered, &forced_dcat_paths)` now skips each discovered top-level key whose translated path (`/dcat/<key>`) equals or prefixes any forced path. Nested forces (e.g. `/dcat/dcat:contactPoint/vcard:fn`) block the whole-object overlay since `merge_discovered` operates at the top level — nested-leaf force is satisfied by the later pointer-override pass. Scope-limit: force on `package` / `resource` initial-context entries is still accepted and stripped but NOT honored at merge time — that needs a CKAN→DCAT JSON-Pointer mapping table (documented in `load_initial_context`'s comment as a deferred follow-up). USAGE is updated to spell out the new dataset_info behavior and the package/resource gap. Tests: - 3 unit tests on `collect_forced_dataset_info_paths`: dataset_info collection with mixed wrapper / plain / force:false / null-value-force shapes, no-dataset_info, pathological non-object dataset_info. - 4 unit tests on `merge_discovered`: forced top-level key blocks overlay; forced nested path blocks the whole-object overlay; unrelated discovered keys still fill when one is forced; forced paths outside the /dcat subtree are ignored. - 1 integration test exercising the full flow against the qsv binary: initial-context with `{value: "MIT IRI", force: true}` for dct:license (lands via pointer override) and `{value: null, force: true}` for dct:rights (null round-trips, force blocks hypothetical discovery overlay). Verified: 119 profile unit tests pass (was 112, +7); 18 integration tests pass under both -F all_features and -F datapusher_plus (was 17, +1). cargo +nightly fmt + clippy clean, docs/help regenerated, docs-drift-check reports no drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(profile): RFC 6901 escape discovered keys in merge_discovered (roborev #2469) One Medium finding on the §5.4 commit: the candidate JSON-Pointer path built from each discovered DCAT key was interpolated directly without RFC 6901 token escaping. A user wanting to force a JSON-LD property whose key contains `/` or `~` (full IRIs like `http://purl.org/dc/terms/title`, the rare CURIE-with-tilde) would write the path in its escaped form (`/dcat/http:~1~1purl.org~1dc~1terms~1title`), but our candidate construction produced the un-escaped raw form (`/dcat/http://purl.org/dc/terms/title`) — too many pointer segments, never matches, force is silently ignored. Fix: - New `escape_json_pointer_token` helper that applies RFC 6901 section 4 escaping (`~` → `~0`, `/` → `~1`) in the correct order (`~` first, otherwise the `~1` from a `/` would get double-escaped to `~01`). - `merge_discovered` builds `candidate = format!("/dcat/{}", escape_json_pointer_token(k))` so the comparison stays in the canonical escaped JSON-Pointer space. Tests (3 new in src/cmd/profile.rs::tests): - merge_force_match_handles_full_iri_keys_via_rfc6901_escaping: forced path `/dcat/http:~1~1purl.org~1dc~1terms~1title` correctly blocks the discovered `http://purl.org/dc/terms/title` overlay. - merge_force_does_not_match_unrelated_keys_after_escaping: regression check that the same escaping doesn't over-eagerly match an unrelated `dct:identifier` key. - escape_json_pointer_token_matches_rfc6901: unit-level matrix — plain, /-only, ~-only, the tricky `~/` ordering trap (must yield `~0~1`, not `~01`), and the full-IRI case. Verified: 122 profile unit tests pass (was 119, +3); 17 integration tests pass under both -F all_features and -F datapusher_plus. cargo +nightly fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address copilot review: URL-safe sibling candidates + validate flag forwarding * dcat_discover::sibling_candidates: build all four candidates via `url::Url` parsing so query strings and fragments on the input URL don't get baked into the appended suffix. An input like `snapshot.csv?token=abc#frag` was producing `snapshot.csv?token=abc.metadata.json`, which servers interpreted as a GET on the CSV with a polluted query value rather than a fetch of the sibling JSON. Falls back to textual append only when the URL fails to parse. Updated the corresponding test to assert the new behavior for all four candidate slots. * profile::run_profile_validation: forward `--no-headers` and `--delimiter` to `qsv validate` so it parses the input the same way the rest of the profile pipeline (stats/frequency/count) does. Without this, non-default CSV options would yield spurious or missed RFC4180 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(profile): regression test for `validate` --delimiter forwarding (roborev #2471) Roborev flagged the new `--no-headers` / `--delimiter` forwarding path in `run_profile_validation` as uncovered: the existing validation test only exercised default comma-delimited input with headers, so it would still pass if the forwarded args were dropped or misordered. The new test uses a `;`-delimited CSV whose rows contain unquoted commas. When parsed as the default `,`-delimited, field counts mismatch the 1-field header and `qsv validate` emits an RFC4180 record-length failure. When parsed with `;`, the six fields per row line up and validation passes. Asserting the absence of a `qsv:validation` warning on this input proves the `--delimiter ;` flag was forwarded to the spawned `qsv validate`. Verified by running `qsv validate` directly on the same content with and without `--delimiter ;` — exit 1 vs exit 0 respectively, confirming the test would fail if the forwarding were ever removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps jsonschema from 0.28.1 to 0.28.2.
Release notes
Sourced from jsonschema's releases.
Changelog
Sourced from jsonschema's changelog.
Commits
7c59034chore(rust): Release 0.28.2615fa1efix: Resolving relative references with fragments1b75969build(deps): bump crates/jsonschema-referencing/tests/suite73a2e6fperf: Faster JSON pointer resolution210eebbchore: Clippy lints7a2ac3efix: Resolving external references nested inside local references29b37f0chore(python): Release 0.28.1Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)