datefmt: review-driven cleanup, strict tz/flag validation, and ~9% perf win#3753
Merged
Conversation
…rf win - Tighten flag handling: --input-tz/--output-tz now error on unparseable IANA names instead of silently falling back to default-tz; --utc and --zulu now error when combined with conflicting --input-tz/--output-tz or --formatstr instead of silently overriding them. - Fix multi-column --new-column producing ragged CSVs. The flag now accepts a comma-separated list of names that must match the selection count (mirroring --rename); single-name + multi-column is rejected. - Strip the time component for both "T00:00:00+00:00" and "T00:00:00Z" midnight outputs, so --zulu collapses bare-date inputs consistently with the default ISO formatter. - Single-pass record rebuild in the rayon closure replaces the per-row O(k·N) replace_column_value loop. Wide-CSV bench (50 cols, 200k rows, 5 selected): 503.8ms → 460.6ms (~9% faster, ~14% less user CPU). - Drop redundant `.with_timezone(&Utc)` calls in unix_timestamp, inline format_date_with_tz, hoist new_column out of the closure, and remove an #[allow(unused_assignments)] that was masking the lint. - Fix USAGE wording on --default-tz (negation was dropped). Document --new-column comma-list semantics and add a comment explaining why atoi_simd rejects negative timestamps in the unix-ts fast path. - Add tests: multi-col --new-column, count mismatch, invalid --input-tz, --utc/--input-tz conflict, --zulu/--formatstr conflict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extract --formatstr default into DEFAULT_FORMATSTR const used by the --zulu conflict check, with an inline comment that it must stay in sync with the docopt `[default: %+]` literal in USAGE. - Reword --default-tz USAGE to describe the actual fallback semantics: it is consulted only when --input-tz/--output-tz is set to "local" but local-timezone detection fails, and does NOT override the --input-tz/--output-tz defaults. Avoids `--flag=value` literals in the description, which docopt was misparsing as flag definitions. - Add tests for the two previously uncovered --utc conflict branches: datefmt_utc_conflicts_with_output_tz and datefmt_utc_conflicts_with_default_tz. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 17 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves the datefmt command’s correctness, UX, and throughput by tightening timezone/flag validation, fixing multi-column --new-column behavior, and optimizing the per-row transformation path.
Changes:
- Enforce stricter
--input-tz/--output-tzparsing and reject conflicting shortcut flags (--utc,--zulu) instead of silently overriding user inputs. - Fix multi-column
--new-columnto accept a comma-separated list of names matching the selected column count and error clearly on mismatches. - Optimize the rayon record transform to rebuild output records in a single pass and normalize midnight UTC outputs (including
...Z) to date-only when--keep-zero-timeis not set.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/cmd/datefmt.rs |
Adds strict tz/flag validation, multi-name --new-column handling, midnight UTC normalization, and a faster single-pass record rebuild. |
tests/test_datefmt.rs |
Expands coverage for tz validation, shortcut conflict detection, and multi-column --new-column behavior; updates zulu midnight expectation. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Merged
5 tasks
jqnatividad
added a commit
that referenced
this pull request
Apr 27, 2026
…-free streaming loop (#3756) * sortcheck: review-driven cleanup, add --numeric/--natural, allocation-free streaming loop Bring sortcheck in line with the recent review-driven cleanups for sort (#3755), dedup (#3754), datefmt (#3753), and util (#3752), and close a real feature gap: sortcheck previously only supported lex / --ignore-case, so users could not pre-verify a CSV before piping into `sort --numeric`, `sort --natural`, or `dedup --numeric --sorted`. Changes - Add --numeric (-N) and --natural flags, reusing iter_cmp_num, iter_cmp_natural, and iter_cmp_natural_ignore_case already exported from cmd/sort.rs (no new comparators). - Resolve flags into a ComparisonMode enum once before the loop, with the same precedence as sort/dedup: --natural > --numeric > --ignore-case. Dispatch is a single match per row, monomorphizing to one comparator. - Replace record.clone_from(&next_record) with std::mem::swap on both Less and Greater arms — zero-copy buffer rotation, matching dedup #3754. - args.flag_json | args.flag_pretty_json -> || (idiomatic logical OR). - dupe_count as i64 -> i64::try_from(...).unwrap_or(i64::MAX) with a comment on why -1 signals "not applicable". - Drop stale #[allow(dead_code)] on Args; add a comment on the do_json-forces-full-scan invariant that makes scan_ctr equivalent to record_count in the JSON path. - USAGE: documented the new flags + their precedence, added an Examples block, tightened the extsort paragraph (sort itself is multi-threaded). - Regenerated docs/help/sortcheck.md via `qsv --generate-help-md`. Tests - 10 new tests in tests/test_sortcheck.rs covering --ignore-case (including a real lex-vs-ignore-case discriminator), empty input, single-row input, --numeric sorted/unsorted, --natural sorted/unsorted, --natural + --ignore-case composition, and a precedence test confirming --natural overrides --numeric. - 19/19 sortcheck tests pass; `cargo clippy -F all_features -- -D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address review: empty-input scan_ctr off-by-one, docopt-safe USAGE, ComparisonMode Copy, test rename - Fix empty-input bug: a header-only CSV used to report `record_count: 1` because the loop incremented scan_ctr once even when the initial read_byte_record returned false. Guard the loop and simplify the JSON path to always use scan_ctr (do_json forces a full scan, so it's authoritative). - Reword the --ignore-case USAGE block so no continuation line begins with `--natural` (docopt scans for option-like prefixes; lines starting with `--` could be misinterpreted as a second option definition). - Also clarify the wording: --ignore-case is only ignored under PURE numeric comparison (--numeric without --natural), since the documented precedence allows --natural --numeric --ignore-case where ignore-case does apply. - Derive Clone/Copy on ComparisonMode to mirror SortMode in cmd/sort.rs. Functionally a no-op (matching unit variants doesn't move), but defensive and consistent with sort.rs. - Rename test sortcheck_ignore_case_sorted -> sortcheck_ignore_case_notsorted (the test asserts an error; the prior name was misleading — the truly-sorted-under-ignore-case case is covered by sortcheck_ignore_case_actually_sorted). - Update sortcheck_empty test to expect record_count: 0 (was 1, codifying the off-by-one). - Regenerated docs/help/sortcheck.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--input-tz/--output-tzparsing to error on unparseable IANA names; reject--utc/--zuluwhen combined with conflicting per-tz / formatstr flags instead of silently overriding.--new-columnproducing ragged CSVs — flag now accepts a comma-separated list mirroring--rename, and rejects single-name + multi-column with a clear error.T00:00:00+00:00andT00:00:00Zmidnight outputs, so--zulucollapses bare-date inputs consistently with the default ISO formatter.replace_column_valueloop. Wide-CSV bench (50 cols, 200k rows, 5 selected): 503.8ms → 460.6ms (~9% faster, ~14% less user CPU)..with_timezone(&Utc)calls inunix_timestamp, inlineformat_date_with_tz, hoistnew_columnout of the closure, remove an#[allow(unused_assignments)]that was masking the lint, fix the dropped negation in--default-tzUSAGE wording, and reword--default-tzto describe the actual fallback semantics.--formatstrdefault intoDEFAULT_FORMATSTRconst used by the--zuluconflict check (with sync comment for the docopt USAGE literal).Test plan
cargo build --locked --bin qsv -F all_featurescargo t test_datefmt -F all_features— 26/26 pass (was 19, added 7 new tests)cargo clippy --locked --bin qsv -F all_features -- -D warnings— clean🤖 Generated with Claude Code