Skip to content

build(deps): bump jsonschema from 0.28.1 to 0.28.2#2469

Merged
jqnatividad merged 1 commit into
masterfrom
dependabot/cargo/jsonschema-0.28.2
Jan 22, 2025
Merged

build(deps): bump jsonschema from 0.28.1 to 0.28.2#2469
jqnatividad merged 1 commit into
masterfrom
dependabot/cargo/jsonschema-0.28.2

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jan 22, 2025

Copy link
Copy Markdown
Contributor

Bumps jsonschema from 0.28.1 to 0.28.2.

Release notes

Sourced from jsonschema's releases.

[Python] Release 0.28.2

Fixed

  • Resolving external references that are nested inside local references. #671
  • Resolving relative references with fragments against base URIs that also contain fragments. #666

Performance

  • Faster JSON pointer resolution.

[Rust] Release 0.28.2

Fixed

  • Resolving external references that are nested inside local references. #671
  • Resolving relative references with fragments against base URIs that also contain fragments. #666

Performance

  • Faster JSON pointer resolution.
Changelog

Sourced from jsonschema's changelog.

[0.28.2] - 2025-01-22

Fixed

  • Resolving external references that nested inside local references. #671
  • Resolving relative references with fragments against base URIs that also contain fragments. #666

Performance

  • Faster JSON pointer resolution.
Commits
  • 7c59034 chore(rust): Release 0.28.2
  • 615fa1e fix: Resolving relative references with fragments
  • 1b75969 build(deps): bump crates/jsonschema-referencing/tests/suite
  • 73a2e6f perf: Faster JSON pointer resolution
  • 210eebb chore: Clippy lints
  • 7a2ac3e fix: Resolving external references nested inside local references
  • 29b37f0 chore(python): Release 0.28.1
  • See full diff in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [jsonschema](https://github.com/Stranger6667/jsonschema) from 0.28.1 to 0.28.2.
- [Release notes](https://github.com/Stranger6667/jsonschema/releases)
- [Changelog](https://github.com/Stranger6667/jsonschema/blob/master/CHANGELOG.md)
- [Commits](Stranger6667/jsonschema@rust-v0.28.1...rust-v0.28.2)

---
updated-dependencies:
- dependency-name: jsonschema
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file rust Pull requests that update Rust code labels Jan 22, 2025
@jqnatividad jqnatividad merged commit 070b743 into master Jan 22, 2025
@jqnatividad jqnatividad deleted the dependabot/cargo/jsonschema-0.28.2 branch January 22, 2025 13:05
jqnatividad added a commit that referenced this pull request May 26, 2026
…ven validation, force:true, UUID URL-title walk-up, GSA-bundle deferral) (#3904)

* feat(profile): walk past UUID-like basenames in url_title_default (§5.9)

For CKAN-style `/datastore/dump/<uuid>` URLs the leaf basename is
an opaque UUID — better than the random tempfile suffix but still
not a usable title. Walk one level up (capped at 3) and return the
first non-UUID-like segment we find. The classic CKAN dump URL now
yields "dump" instead of a 36-char hex.

New helper `is_uuid_like()` matches:
  - canonical 8-4-4-4-12 hex with dashes
  - compact 32 contiguous hex characters
Both case-insensitive. Other ID-like patterns (MongoDB ObjectId at
24 hex, ULIDs, slugified IDs) are intentionally NOT matched —
over-eager matching would walk past legitimate titles like
"2024-Q3".

Behavior:
  /datastore/dump/<uuid>               -> "dump"          (was: uuid)
  /path/snapshots/<32-hex>             -> "snapshots"     (was: hex)
  /datastore/dump/2024-Q3-payments.csv -> "2024-Q3-payments" (unchanged)
  /<uuid>/<uuid>/<uuid>                -> leaf uuid (fallback after cap)
  36-char non-hex string               -> unchanged (length-collision check)

If every candidate up the 3-level cap is UUID-like, falls back to
the leaf UUID — still reproducible, still beats the tempfile
suffix. Users wanting a prettier title supply
`--initial-context.package.title`; a CKAN
`/api/3/action/resource_show?id=<uuid>` lookup is a deferred
follow-up.

The previous `url_title_preserves_uuid_basename_unchanged` test
documented the old behavior — replaced with four new tests covering
the walk, the all-UUID fallback, the normal-basename regression
check (including a 36-char non-hex length-collision case), and an
`is_uuid_like` unit-level matrix of positives + negatives.

Verified: 99 profile unit tests pass; 15 integration tests pass
under both -F all_features and -F datapusher_plus. cargo +nightly
fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): add sibling-URL + JSON-LD DCAT discovery (§5.2)

Wires two more mechanisms into dcat_discover::discover, chained in
priority order after the existing Link: rel=describedBy probe:

  2. Sibling URLs by convention (qsv profile follow-ups §5.2). Four
     candidates tried in order:
       - <url>.metadata.json     (qsv profile's own output naming)
       - <url>.dcat.json         (common DCAT-JSON convention)
       - <dirname>/datapackage.json (Frictionless Data Package spec)
       - <host>/.well-known/data.json (DCAT-US site catalog)

  3. HTML JSON-LD <script type="application/ld+json"> blocks in the
     URL's parent (landing-page) HTML. Open-data portals typically
     host the dataset page one level above the raw CSV download.

Implementation:
- New `discover_via_sibling_urls` + `sibling_candidates` helper.
  Hand-rolled .metadata.json/.dcat.json suffixing preserves query
  strings (textual append); url::Url-based construction for the
  datapackage.json and /.well-known/data.json variants drops query
  & fragment since they're host-relative, not input-relative.
- New `discover_via_html_jsonld` + `extract_jsonld_blocks` helper.
  Pure-string HTML scan (no parser dep): locate <script ...> tags,
  case-insensitive type-attribute check for application/ld+json,
  parse the body as JSON, run through extract_dcat_dataset (which
  already handles @graph envelopes + bare-object shape fallback).
  Skips response if neither Content-Type nor body sniff suggests
  HTML — avoids wasted scans on PDFs or binary blobs served with
  no Content-Type.
- New `fetch_json_and_extract` shared GET helper, mirroring
  discover_via_link_header's 4 MiB body cap.

Module doc comment updated: the §5.2 "follow-up" markers are
replaced with the new active descriptions.

Nine unit tests added (sibling_candidates × 3,
extract_jsonld_blocks × 6) — pure-string, no network. Covers
typical CSV URL, query+fragment stripping, host-only URLs, basic
<script> match, mixed-case type attribute, walking past
non-dataset blocks, no-match negative, unrelated <script> tags,
and the @graph envelope variant.

Verified: 108 profile unit tests pass (was 99, +9 new); 15
integration tests pass under both -F all_features and -F
datapusher_plus. cargo +nightly fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): run qsv validate when spec declares validators (§5.8)

When the scheming spec declares one or more `validators` on any
field (dataset_fields or resource_fields), invoke `qsv validate`
against the input and merge any RFC4180 failures into
dcat_warnings. The presence of validators is the trigger; their
string content isn't interpreted yet — auto-generating a JSON
Schema from declared types + CKAN validators is a future
enhancement, but the architectural hook is in place.

Implementation:
- `Spec::has_validators()` walks both dataset_fields and
  resource_fields, returns true if any field's extras carry a
  non-empty, non-whitespace `validators` string. Whitespace-only
  entries are intentionally treated as "not declared" so empty
  but present entries don't accidentally trigger.
- `run_profile_validation(input_path) -> Vec<DcatWarning>` spawns
  `qsv validate <input>` directly (not via util::run_qsv_cmd,
  which errors on non-zero exit — the validate path needs to
  succeed when the subprocess fails). Best-effort: spawn errors,
  missing binary, or non-UTF-8 stderr all silently degrade to
  "no warnings". Emits a `qsv profile: ran `validate`` status
  line on stderr, mirroring the existing `ran `frequency`` /
  `ran `count`` markers so the helper's invocation is observable.
- Wired into the existing dcat_warnings merge block in
  profile.rs::run, alongside the build-time warning filter and
  --validate-dcat schema-violation path. Independent of
  --validate-dcat (which validates the emitted dcat block, not
  the input CSV).

Failures land as DcatWarning entries with:
  field = "qsv:validation"
  severity = Required
  message = "input failed `qsv validate` (RFC4180): <detail>"

Tests:
- Four unit tests on Spec::has_validators: dataset-side trigger,
  resource-side trigger, none-declared negative, whitespace-only
  negative.
- Two integration tests on the trigger plumbing:
  profile_runs_validation_when_spec_declares_validators (clean
  CSV + druf spec → validate spawns, no qsv:validation warning),
  profile_skips_validation_when_spec_has_no_validators (spec-less
  → validate must NOT spawn).

Verified: 112 profile unit tests pass (was 108, +4); 17 integration
tests pass under both -F all_features and -F datapusher_plus.
cargo +nightly fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(profile): explain why GSA bundle vendoring is deferred (§5.3)

The handoff suggested vendoring the full GSA dcat-us JSON Schema
suite as a drop-in replacement for embedded_minimal_schema. While
investigating, hit a fundamental shape mismatch the original plan
didn't account for: the GSA bundle is written against the
**unprefixed** JSON-LD-expanded form (`otherIdentifier`, `@type:
"Dataset"`) while `dcat::build` emits the **prefixed JSON-LD-compact**
form (`dct:identifier`, `@type: "dcat:Dataset"`). Naïvely vendoring
the bundle and pointing the validator at it would flag every key
as missing.

Updated the dcat_validate module-level doc comment to spell out
the three real paths forward (JSON-LD expansion, key translation
layer, refactor dcat::build to emit expanded form) and why each
is bigger scope than a vendor-and-swap. Embedded minimal schema
stays in place — it catches the mandatory-field class of mistake
cheaply.

No code changes; doc-only commit so the next maintainer doesn't
re-do the same investigation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(profile): honor force:true in dataset_info at merge time (§5.4)

Wraps the existing `{value, force: true}` plumbing with real
merge-time effect for `dataset_info` JSON-Pointer entries.
Discovered DCAT (Link header / sibling URL / JSON-LD <script>)
will no longer overlay paths the user marked forced — even when
the inferred projection left them absent.

Use case: declare a field "intentionally absent" and prevent
publisher DCAT discovery from silently filling it in. Example:
  {"dataset_info":
     {"/dcat/dct:rights": {"value": null, "force": true}}}
yields literal `null` at `/dcat/dct:rights` AND blocks any
discovered `dct:rights` from being merged in.

Implementation:
- `collect_forced_dataset_info_paths(raw)` walks the `dataset_info`
  subtree BEFORE `normalize_value_force` strips the wrappers and
  collects pointer paths whose value matched the exact two-key
  `{"value": ..., "force": true}` shape. `force: false` and plain
  values aren't collected.
- `load_initial_context` signature extended: returns
  `(package, resource, dataset_info, forced_dcat_paths)`. The
  previous wrapper-stripping behavior is unchanged.
- `AnalysisContext` gains `forced_dcat_paths: Vec<String>` so the
  orchestrator can hand it to `merge_discovered`.
- `merge_discovered(inferred, discovered, &forced_dcat_paths)` now
  skips each discovered top-level key whose translated path
  (`/dcat/<key>`) equals or prefixes any forced path. Nested
  forces (e.g. `/dcat/dcat:contactPoint/vcard:fn`) block the
  whole-object overlay since `merge_discovered` operates at the
  top level — nested-leaf force is satisfied by the later
  pointer-override pass.

Scope-limit: force on `package` / `resource` initial-context
entries is still accepted and stripped but NOT honored at merge
time — that needs a CKAN→DCAT JSON-Pointer mapping table
(documented in `load_initial_context`'s comment as a deferred
follow-up). USAGE is updated to spell out the new dataset_info
behavior and the package/resource gap.

Tests:
- 3 unit tests on `collect_forced_dataset_info_paths`:
  dataset_info collection with mixed wrapper / plain / force:false
  / null-value-force shapes, no-dataset_info, pathological
  non-object dataset_info.
- 4 unit tests on `merge_discovered`: forced top-level key blocks
  overlay; forced nested path blocks the whole-object overlay;
  unrelated discovered keys still fill when one is forced; forced
  paths outside the /dcat subtree are ignored.
- 1 integration test exercising the full flow against the qsv
  binary: initial-context with `{value: "MIT IRI", force: true}`
  for dct:license (lands via pointer override) and `{value: null,
  force: true}` for dct:rights (null round-trips, force blocks
  hypothetical discovery overlay).

Verified: 119 profile unit tests pass (was 112, +7); 18
integration tests pass under both -F all_features and -F
datapusher_plus (was 17, +1). cargo +nightly fmt + clippy clean,
docs/help regenerated, docs-drift-check reports no drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(profile): RFC 6901 escape discovered keys in merge_discovered (roborev #2469)

One Medium finding on the §5.4 commit: the candidate JSON-Pointer
path built from each discovered DCAT key was interpolated directly
without RFC 6901 token escaping. A user wanting to force a JSON-LD
property whose key contains `/` or `~` (full IRIs like
`http://purl.org/dc/terms/title`, the rare CURIE-with-tilde) would
write the path in its escaped form
(`/dcat/http:~1~1purl.org~1dc~1terms~1title`), but our candidate
construction produced the un-escaped raw form
(`/dcat/http://purl.org/dc/terms/title`) — too many pointer
segments, never matches, force is silently ignored.

Fix:
- New `escape_json_pointer_token` helper that applies RFC 6901
  section 4 escaping (`~` → `~0`, `/` → `~1`) in the correct order
  (`~` first, otherwise the `~1` from a `/` would get
  double-escaped to `~01`).
- `merge_discovered` builds `candidate = format!("/dcat/{}",
  escape_json_pointer_token(k))` so the comparison stays in the
  canonical escaped JSON-Pointer space.

Tests (3 new in src/cmd/profile.rs::tests):
- merge_force_match_handles_full_iri_keys_via_rfc6901_escaping:
  forced path `/dcat/http:~1~1purl.org~1dc~1terms~1title` correctly
  blocks the discovered `http://purl.org/dc/terms/title` overlay.
- merge_force_does_not_match_unrelated_keys_after_escaping:
  regression check that the same escaping doesn't over-eagerly
  match an unrelated `dct:identifier` key.
- escape_json_pointer_token_matches_rfc6901: unit-level matrix —
  plain, /-only, ~-only, the tricky `~/` ordering trap (must yield
  `~0~1`, not `~01`), and the full-IRI case.

Verified: 122 profile unit tests pass (was 119, +3); 17 integration
tests pass under both -F all_features and -F datapusher_plus.
cargo +nightly fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address copilot review: URL-safe sibling candidates + validate flag forwarding

* dcat_discover::sibling_candidates: build all four candidates via
  `url::Url` parsing so query strings and fragments on the input URL
  don't get baked into the appended suffix. An input like
  `snapshot.csv?token=abc#frag` was producing
  `snapshot.csv?token=abc.metadata.json`, which servers interpreted as
  a GET on the CSV with a polluted query value rather than a fetch of
  the sibling JSON. Falls back to textual append only when the URL
  fails to parse. Updated the corresponding test to assert the new
  behavior for all four candidate slots.

* profile::run_profile_validation: forward `--no-headers` and
  `--delimiter` to `qsv validate` so it parses the input the same way
  the rest of the profile pipeline (stats/frequency/count) does.
  Without this, non-default CSV options would yield spurious or
  missed RFC4180 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(profile): regression test for `validate` --delimiter forwarding (roborev #2471)

Roborev flagged the new `--no-headers` / `--delimiter` forwarding path in
`run_profile_validation` as uncovered: the existing validation test only
exercised default comma-delimited input with headers, so it would still
pass if the forwarded args were dropped or misordered.

The new test uses a `;`-delimited CSV whose rows contain unquoted
commas. When parsed as the default `,`-delimited, field counts mismatch
the 1-field header and `qsv validate` emits an RFC4180 record-length
failure. When parsed with `;`, the six fields per row line up and
validation passes. Asserting the absence of a `qsv:validation` warning
on this input proves the `--delimiter ;` flag was forwarded to the
spawned `qsv validate`.

Verified by running `qsv validate` directly on the same content with
and without `--delimiter ;` — exit 1 vs exit 0 respectively, confirming
the test would fail if the forwarding were ever removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant