Skip to content

datefmt: review-driven cleanup, strict tz/flag validation, and ~9% perf win#3753

Merged
jqnatividad merged 3 commits into
masterfrom
datefmt-codereview-202604
Apr 26, 2026
Merged

datefmt: review-driven cleanup, strict tz/flag validation, and ~9% perf win#3753
jqnatividad merged 3 commits into
masterfrom
datefmt-codereview-202604

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

  • Tighten --input-tz / --output-tz parsing to error on unparseable IANA names; reject --utc / --zulu when combined with conflicting per-tz / formatstr flags instead of silently overriding.
  • Fix multi-column --new-column producing ragged CSVs — flag now accepts a comma-separated list mirroring --rename, and rejects single-name + multi-column with a clear error.
  • Strip the time component for both T00:00:00+00:00 and T00:00:00Z midnight outputs, so --zulu collapses bare-date inputs consistently with the default ISO formatter.
  • Single-pass record rebuild in the rayon closure replaces the per-row O(k·N) replace_column_value loop. Wide-CSV bench (50 cols, 200k rows, 5 selected): 503.8ms → 460.6ms (~9% faster, ~14% less user CPU).
  • Drop redundant .with_timezone(&Utc) calls in unix_timestamp, inline format_date_with_tz, hoist new_column out of the closure, remove an #[allow(unused_assignments)] that was masking the lint, fix the dropped negation in --default-tz USAGE wording, and reword --default-tz to describe the actual fallback semantics.
  • Extract --formatstr default into DEFAULT_FORMATSTR const used by the --zulu conflict check (with sync comment for the docopt USAGE literal).

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo t test_datefmt -F all_features — 26/26 pass (was 19, added 7 new tests)
  • cargo clippy --locked --bin qsv -F all_features -- -D warnings — clean
  • Hyperfine perf bench on 50-col / 200k-row CSV with 5 date columns selected: 503.8ms → 460.6ms (~9% faster)
  • Two roborev reviews resolved (job 1698 baseline + job 1704 follow-up on the perf/cleanup commit; all findings addressed and reviews closed)

🤖 Generated with Claude Code

jqnatividad and others added 2 commits April 26, 2026 15:50
…rf win

- Tighten flag handling: --input-tz/--output-tz now error on unparseable
  IANA names instead of silently falling back to default-tz; --utc and
  --zulu now error when combined with conflicting --input-tz/--output-tz
  or --formatstr instead of silently overriding them.
- Fix multi-column --new-column producing ragged CSVs. The flag now
  accepts a comma-separated list of names that must match the selection
  count (mirroring --rename); single-name + multi-column is rejected.
- Strip the time component for both "T00:00:00+00:00" and "T00:00:00Z"
  midnight outputs, so --zulu collapses bare-date inputs consistently
  with the default ISO formatter.
- Single-pass record rebuild in the rayon closure replaces the per-row
  O(k·N) replace_column_value loop. Wide-CSV bench (50 cols, 200k rows,
  5 selected): 503.8ms → 460.6ms (~9% faster, ~14% less user CPU).
- Drop redundant `.with_timezone(&Utc)` calls in unix_timestamp,
  inline format_date_with_tz, hoist new_column out of the closure,
  and remove an #[allow(unused_assignments)] that was masking the lint.
- Fix USAGE wording on --default-tz (negation was dropped). Document
  --new-column comma-list semantics and add a comment explaining why
  atoi_simd rejects negative timestamps in the unix-ts fast path.
- Add tests: multi-col --new-column, count mismatch, invalid --input-tz,
  --utc/--input-tz conflict, --zulu/--formatstr conflict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extract --formatstr default into DEFAULT_FORMATSTR const used by the
  --zulu conflict check, with an inline comment that it must stay in
  sync with the docopt `[default: %+]` literal in USAGE.
- Reword --default-tz USAGE to describe the actual fallback semantics:
  it is consulted only when --input-tz/--output-tz is set to "local"
  but local-timezone detection fails, and does NOT override the
  --input-tz/--output-tz defaults. Avoids `--flag=value` literals in
  the description, which docopt was misparsing as flag definitions.
- Add tests for the two previously uncovered --utc conflict branches:
  datefmt_utc_conflicts_with_output_tz and
  datefmt_utc_conflicts_with_default_tz.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad requested a review from Copilot April 26, 2026 20:17
@codacy-production

codacy-production Bot commented Apr 26, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 17 complexity

Metric Results
Complexity 17

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the datefmt command’s correctness, UX, and throughput by tightening timezone/flag validation, fixing multi-column --new-column behavior, and optimizing the per-row transformation path.

Changes:

  • Enforce stricter --input-tz / --output-tz parsing and reject conflicting shortcut flags (--utc, --zulu) instead of silently overriding user inputs.
  • Fix multi-column --new-column to accept a comma-separated list of names matching the selected column count and error clearly on mismatches.
  • Optimize the rayon record transform to rebuild output records in a single pass and normalize midnight UTC outputs (including ...Z) to date-only when --keep-zero-time is not set.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/cmd/datefmt.rs Adds strict tz/flag validation, multi-name --new-column handling, midnight UTC normalization, and a faster single-pass record rebuild.
tests/test_datefmt.rs Expands coverage for tz validation, shortcut conflict detection, and multi-column --new-column behavior; updates zulu midnight expectation.

Comment thread src/cmd/datefmt.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad merged commit 106b181 into master Apr 26, 2026
14 of 15 checks passed
@jqnatividad jqnatividad deleted the datefmt-codereview-202604 branch April 26, 2026 20:25
jqnatividad added a commit that referenced this pull request Apr 27, 2026
…-free streaming loop (#3756)

* sortcheck: review-driven cleanup, add --numeric/--natural, allocation-free streaming loop

Bring sortcheck in line with the recent review-driven cleanups for sort
(#3755), dedup (#3754), datefmt (#3753), and util (#3752), and close a
real feature gap: sortcheck previously only supported lex / --ignore-case,
so users could not pre-verify a CSV before piping into `sort --numeric`,
`sort --natural`, or `dedup --numeric --sorted`.

Changes
- Add --numeric (-N) and --natural flags, reusing iter_cmp_num,
  iter_cmp_natural, and iter_cmp_natural_ignore_case already exported
  from cmd/sort.rs (no new comparators).
- Resolve flags into a ComparisonMode enum once before the loop, with
  the same precedence as sort/dedup: --natural > --numeric > --ignore-case.
  Dispatch is a single match per row, monomorphizing to one comparator.
- Replace record.clone_from(&next_record) with std::mem::swap on both
  Less and Greater arms — zero-copy buffer rotation, matching dedup #3754.
- args.flag_json | args.flag_pretty_json -> || (idiomatic logical OR).
- dupe_count as i64 -> i64::try_from(...).unwrap_or(i64::MAX) with a
  comment on why -1 signals "not applicable".
- Drop stale #[allow(dead_code)] on Args; add a comment on the
  do_json-forces-full-scan invariant that makes scan_ctr equivalent to
  record_count in the JSON path.
- USAGE: documented the new flags + their precedence, added an Examples
  block, tightened the extsort paragraph (sort itself is multi-threaded).
- Regenerated docs/help/sortcheck.md via `qsv --generate-help-md`.

Tests
- 10 new tests in tests/test_sortcheck.rs covering --ignore-case
  (including a real lex-vs-ignore-case discriminator), empty input,
  single-row input, --numeric sorted/unsorted, --natural sorted/unsorted,
  --natural + --ignore-case composition, and a precedence test
  confirming --natural overrides --numeric.
- 19/19 sortcheck tests pass; `cargo clippy -F all_features -- -D warnings` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address review: empty-input scan_ctr off-by-one, docopt-safe USAGE, ComparisonMode Copy, test rename

- Fix empty-input bug: a header-only CSV used to report
  `record_count: 1` because the loop incremented scan_ctr once even when
  the initial read_byte_record returned false. Guard the loop and
  simplify the JSON path to always use scan_ctr (do_json forces a full
  scan, so it's authoritative).
- Reword the --ignore-case USAGE block so no continuation line begins
  with `--natural` (docopt scans for option-like prefixes; lines starting
  with `--` could be misinterpreted as a second option definition).
- Also clarify the wording: --ignore-case is only ignored under PURE
  numeric comparison (--numeric without --natural), since the documented
  precedence allows --natural --numeric --ignore-case where ignore-case
  does apply.
- Derive Clone/Copy on ComparisonMode to mirror SortMode in cmd/sort.rs.
  Functionally a no-op (matching unit variants doesn't move), but
  defensive and consistent with sort.rs.
- Rename test sortcheck_ignore_case_sorted -> sortcheck_ignore_case_notsorted
  (the test asserts an error; the prior name was misleading — the
  truly-sorted-under-ignore-case case is covered by
  sortcheck_ignore_case_actually_sorted).
- Update sortcheck_empty test to expect record_count: 0 (was 1, codifying
  the off-by-one).
- Regenerated docs/help/sortcheck.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants