build(deps): bump jsonschema from 0.18.2 to 0.18.3#2134

dependabot · 2024-09-12T12:47:37Z

Bumps jsonschema from 0.18.2 to 0.18.3.

Release notes

[Python] Release 0.18.3

Fixed

Changing base URI when $ref is present in drafts 7 and earlier.

[Rust] Release 0.18.3

Fixed

Changing base URI when $ref is present in drafts 7 and earlier.

Removed dbg! macro.

Changelog

Sourced from jsonschema's changelog.

[0.18.3] - 2024-09-12

Fixed

Changing base URI when $ref is present in drafts 7 and earlier.

Removed dbg! macro.

Commits

11ebbaa chore(rust): Release 0.18.3
0d359cc fix: Remove dbg! macro
8cb04dd fix: Changing base URI when $ref is present in drafts 7 and earlier
ce678a1 build(deps): bump actions/setup-python from 4 to 5
432e434 build(deps): update boon requirement from 0.5 to 0.6 in /jsonschema
cb66c44 build(deps): bump actions/checkout from 3 to 4
3acee86 build: Create dependabot.yml
b843018 build: update codspeed
cbb86d0 chore(python): Release 0.18.2
6b0a211 fix(build): Python release workflow
See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [jsonschema](https://github.com/Stranger6667/jsonschema-rs) from 0.18.2 to 0.18.3. - [Release notes](https://github.com/Stranger6667/jsonschema-rs/releases) - [Changelog](https://github.com/Stranger6667/jsonschema-rs/blob/master/CHANGELOG.md) - [Commits](Stranger6667/jsonschema@rust-v0.18.2...rust-v0.18.3) --- updated-dependencies: - dependency-name: jsonschema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

…tions 5 findings on the initial commit (4d706e1) — 1 HIGH already addressed in 899de97 (index_files() Ok(None) check), 4 still open: * MEDIUM (generator.rs `parse_f64`): `parse_f64("NaN")` returned `Some(f64::NAN)`, which slipped past both the `hi < lo` short-circuit and the quartile sanity check, and would then panic in `rng.random_range`. Added `.filter(|v| v.is_finite())` so non-finite endpoints fail at parse time. Same path now rejects `±Infinity`. * LOW (USAGE / docs): "logical value maps consistently" was unqualified, but for cardinality > CARDINALITY_POOL_CAP (100k) we generate a fresh fake per row. Spelled this out in the USAGE. * LOW (mod.rs USAGE Examples block): the second example had two `$ qsv` lines under one comment, so the MCP-skill generator gave both the same description. Split into two comments — "First, generate the Data Dictionary with describegpt" / "Then layer in semantic fakers from the dictionary" — and regenerated qsv-synthesize.json + synthesize.md. * LOW (qsv-synthesize.json missing `--jobs`): not addressed by design. `--jobs` is intentionally skipped from MCP skills as "infrastructure setting" by mcp_skills_gen.rs:367 — verified the same flag is omitted from qsv-stats.json (29 options, no jobs), qsv-frequency.json (35 options, no jobs). This is a project-wide convention, not a synthesize-specific bug. Tests: 12 unit + 9 integration, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(synthesize): add `synthesize` command to generate statistically-faithful synthetic CSVs `qsv synthesize <input.csv>` runs `stats` + `frequency` + `count` on the source and emits N rows of synthetic data that reproduce the source's per-column attributes: * Categorical / low-cardinality columns are reproduced by frequency-weighted sampling of the *real* value set — cardinality, weights and repetition structure preserved exactly. * Numeric and date/datetime columns use quartile-bucketed generation so the *shape* of the distribution is preserved, not just `[min, max]`. * Null ratios are reproduced per column. * `--seed` makes output fully reproducible (single master `StdRng` threads through both selection logic and every faker call). * `--dictionary <file>` layers in semantic Content Types from `describegpt --dictionary --infer-content-type` — each token maps to a `fake-rs` faker (40-token vocabulary covers names, emails, addresses, UUIDs, etc.). Bounded-cardinality faker columns sample from a fixed pre-generated pool of distinct fake values (the consistency mechanism, so a given logical value maps consistently). * `--infer-content-type` runs `describegpt` internally to build the dictionary on the fly (needs `QSV_LLM_APIKEY`). Cross-column correlation is explicitly out of scope for v1; columns are generated independently. Implementation notes: * New module `src/cmd/synthesize/{mod,dictionary,faker_map,generator}.rs`. Reuses `util::run_qsv_cmd` to shell out to `stats` / `frequency` / `count` (the same primitive `describegpt::perform_analysis` uses) — no refactor of describegpt itself was needed. * Bumped a handful of `pub(super)` items in `src/cmd/describegpt/dictionary.rs` to `pub(crate)` (`StatsRecord`, `FrequencyRecord`, `parse_stats_csv`, `parse_frequency_csv`, `CONTENT_TYPE_VOCAB`) and made the `dictionary` submodule `pub(crate)` so `synthesize` can consume them. * `fake = "5"` (features: `derive`, `chrono`, `uuid`, `random_color`) added as an optional dep. The new `synthesize` feature gates the dep and is wired into `distrib_features` and `qsvmcp` (not `lite`, not `datapusher_plus`). `fake` v5.1.0 uses `rand 0.10` (same as qsv), so its `fake_with_rng` accepts qsv's seeded `StdRng` directly — no RNG version-bridging needed. * Worked around a determinism bug in fake-rs's `Dummy<IP<L>> for String` (ignores the passed RNG); `ip_address` uses `IPv4` directly instead. * Made the help-markdown and MCP-skills generators support module-dir commands (`src/cmd/<name>/mod.rs`), not just flat `src/cmd/<name>.rs`. * Regen also picked up genuine doc drift in `qsv-stats.json`, `qsv-frequency.json`, and `docs/help/frequency.md` (Apache DataSketches big-endian notes) — included. v1 is English-only; `--locale` is reserved for future multi-locale support (each fake-rs locale is a distinct type, so multi-locale dispatch needs macro expansion). Tests: 12 unit tests + 9 integration tests, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address review: suppress DevSkim false positives in synthesize 15 DevSkim code-scanning alerts on the synthesize PR, all false positives: * DS148264 ("weak/non-cryptographic RNG") — 12 hits on `StdRng::seed_from_u64` in *test code*. The whole point of these tests is determinism with a fixed seed; synthesize generates *fake data*, not security tokens. Added `// DevSkim: ignore DS148264` inline (matches the existing pattern in `select.rs:142`, `sample.rs:406`, `sort.rs:202-214`, `pragmastat.rs:668`). * DS126858 ("weak/broken hash algorithm") — 3 hits on the variable name `cmd2` in `tests/test_synthesize.rs`. The substring `md2` matched DevSkim's legacy-hash regex (md2/md4/md5/sha0/sha1). Renamed `cmd1`/`cmd2` → `first_cmd`/`second_cmd` — clearer anyway, no behavior change. Tests: 12 unit + 9 integration, all still passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address review: Copilot fixes — count/--jobs, index fallback, date sampling Three valid issues caught by Copilot's PR review: * `--jobs` was passed to `qsv count`, which doesn't accept it. Split the analysis args into `delim_only_args` (for `count`, which accepts `-d` but not `--jobs`) and `analysis_common_args` (for `stats` / `frequency`, which accept both). `index` continues to get no extra args. * Best-effort indexing was effectively dead code: `Config::index_files()` returns `Ok(None)` (not `Err`) when no index exists, so `.is_err()` was false for the common unindexed case and the `qsv index` subprocess never fired. Switched to `!matches!(..., Ok(Some(_)))` so we now actually create the index when one is missing. Confirmed end-to-end — the "Indexed input" status line now appears for unindexed inputs. * `Date` columns severely under-sampled the max date: `stats` min/max/q* values for `Date` are at midnight, so sampling uniformly over epoch seconds put only a single tick of the max-day in the last bucket (the rest of that day fell outside any bucket). Now `parse_epoch` returns *whole days since the UNIX epoch* for `Date` columns (and still seconds for `DateTime`); the `DateQuantile::next` arm samples a whole day uniformly within the bucket and multiplies back to seconds for formatting. `DateTime` sampling is unchanged. Tests: 12 unit + 9 integration, all passing. Verified `qsv synthesize ... --jobs 4` works end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address roborev #2134: NaN guard, cardinality caveat, example descriptions 5 findings on the initial commit (4d706e1) — 1 HIGH already addressed in 899de97 (index_files() Ok(None) check), 4 still open: * MEDIUM (generator.rs `parse_f64`): `parse_f64("NaN")` returned `Some(f64::NAN)`, which slipped past both the `hi < lo` short-circuit and the quartile sanity check, and would then panic in `rng.random_range`. Added `.filter(|v| v.is_finite())` so non-finite endpoints fail at parse time. Same path now rejects `±Infinity`. * LOW (USAGE / docs): "logical value maps consistently" was unqualified, but for cardinality > CARDINALITY_POOL_CAP (100k) we generate a fresh fake per row. Spelled this out in the USAGE. * LOW (mod.rs USAGE Examples block): the second example had two `$ qsv` lines under one comment, so the MCP-skill generator gave both the same description. Split into two comments — "First, generate the Data Dictionary with describegpt" / "Then layer in semantic fakers from the dictionary" — and regenerated qsv-synthesize.json + synthesize.md. * LOW (qsv-synthesize.json missing `--jobs`): not addressed by design. `--jobs` is intentionally skipped from MCP skills as "infrastructure setting" by mcp_skills_gen.rs:367 — verified the same flag is omitted from qsv-stats.json (29 options, no jobs), qsv-frequency.json (35 options, no jobs). This is a project-wide convention, not a synthesize-specific bug. Tests: 12 unit + 9 integration, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(synthesize): Reuse String buffer and simplify Option handling Declare a mutable String outside the loop and reuse it to avoid repeated allocations, adding #[allow(unused_assignments)] to suppress warnings. Replace the match on faker_map::content_type_to_value(...) with a direct call using the ? operator to return None early on failure, reducing nesting and simplifying control flow in build_faker_pool. * address roborev #2138: drop fake "buffer reuse" in build_faker_pool The prior commit hoisted `let mut value = String::new()` outside the loop under the banner of "reuse String buffer", but `content_type_to_value` returns a freshly-allocated `String` each call (the underlying fake-rs fakers always materialize an owned `String`). Reassigning the hoisted variable just drops the previous allocation and moves in the new one — no allocations are amortized. The `#[allow(unused_assignments)]` and the extra block were also pure noise. Dropped the hoist, the attribute, and the block. Behavior unchanged: let value = faker_map::content_type_to_value(content_type, rng)?; if seen.insert(value.clone()) { pool.push(value); } Actual buffer reuse would require changing `content_type_to_value` to write into a `&mut String`, but that wouldn't help either — fake-rs's `fake_with_rng::<String, _>` still allocates internally on every call. Tests: 12 unit + 9 integration, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dependabot Bot added dependencies Pull requests that update a dependency file rust Pull requests that update Rust code labels Sep 12, 2024

jqnatividad merged commit 68e1a15 into master Sep 12, 2024

dependabot Bot deleted the dependabot/cargo/jsonschema-0.18.3 branch September 12, 2024 13:14

BrewTestBot mentioned this pull request Sep 24, 2024

qsv 0.135.0 Homebrew/homebrew-core#191718

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump jsonschema from 0.18.2 to 0.18.3#2134

build(deps): bump jsonschema from 0.18.2 to 0.18.3#2134
jqnatividad merged 1 commit into
masterfrom
dependabot/cargo/jsonschema-0.18.3

dependabot Bot commented on behalf of github Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Sep 12, 2024

[Python] Release 0.18.3

Fixed

[Rust] Release 0.18.3

Fixed

[0.18.3] - 2024-09-12

Fixed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant