Skip to content

cat: review-driven cleanup, --no-headers fix and rowskey speedup#3750

Merged
jqnatividad merged 3 commits into
masterfrom
cat-codereview-202604
Apr 26, 2026
Merged

cat: review-driven cleanup, --no-headers fix and rowskey speedup#3750
jqnatividad merged 3 commits into
masterfrom
cat-codereview-202604

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

  • Fix a cat rowskey --no-headers correctness bug where the second pass reused only the last file's synthetic header for all files, silently truncating trailing columns of any wider file appearing earlier. Per-file synthetic headers are now stored in pass 1 and indexed back in pass 2.
  • Remove dead stdin-handling code in cat_rowskey (util::process_input already materializes stdin to a temp file before we run, so Config::is_stdin() was never true here). Drop the now-unused DEFAULT_WTR_BUFFER_CAPACITY import.
  • Replace canonicalize().unwrap(), file_name().unwrap(), file_stem().unwrap() and conf_path.unwrap() with ? / graceful fallbacks — consistent with the recent schema/fetch/slice panic-safety passes.
  • Warn (wwarn!) when an input column collides with --group-name, in both the headers and --no-headers branches (silent override before).
  • Speed up the cat_rowskey hot loop: precompute a Vec<Option<usize>> mapping from the global column set to each file's columns once per file, replacing a per-cell IndexMap probe (with Box<[u8]> hashing) with a flat Vec walk.

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo t cat -F all_features — 43 passed (was 42 before adding regression tests)
  • cargo clippy --bin qsv -F all_features --no-deps — clean
  • Manual smoke: stdin via -, two-file --group fname, and the wider-first --no-headers case all render correctly
  • roborev review datefmt doesn't recognize this timestamp #1692 addressed (LOW: collision warning in --no-headers, mirror narrower-first test) and closed

New regression tests:

  • cat_rowskey_no_headers_widerfirst — wider file first, narrower second; would have lost columns under the old code
  • cat_rowskey_no_headers_narrowerfirst — symmetric mirror, pins the case that used to "work by accident"
  • cat_rowskey_group_name_collision — asserts the warning fires and the documented "file column wins" behavior is preserved

🤖 Generated with Claude Code

jqnatividad and others added 2 commits April 26, 2026 11:42
Fix a --no-headers correctness bug in `cat rowskey` where the second pass
reused only the last file's synthetic header for all files, silently
truncating trailing columns of any wider file appearing earlier. Per-file
synthetic headers are now kept and indexed back in pass 2.

Remove dead stdin-handling logic in `cat_rowskey` (process_input already
materializes stdin to a temp file before we run, so Config::is_stdin() was
never true here). Replace canonicalize/file_name/file_stem unwraps with `?`
and graceful fallbacks. Warn when an input column collides with --group-name.

Speed up the rowskey hot loop by precomputing a Vec<Option<usize>> mapping
from the global column set to each file's columns once per file, replacing
a per-cell IndexMap probe (with Box<[u8]> hashing) with a flat Vec walk.

Add regression tests for the wider-first --no-headers case and the
--group-name collision warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rror test

- Emit the --group-name collision warning in the --no-headers branch as
  well; previously a synthetic header (e.g. `_c_1`) matching --group-name
  silently overrode the grouping value with no warning. Hoist
  group_name_bytes to a local while there.
- Add cat_rowskey_no_headers_narrowerfirst as a symmetric regression
  test (narrower first, wider second), pinning the case that used to
  "work by accident" under the old per-pass shared synthetic header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad requested a review from Copilot April 26, 2026 15:50
@codacy-production

codacy-production Bot commented Apr 26, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 9 complexity

Metric Results
Complexity 9

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes correctness and performance issues in cat rowskey, particularly for --no-headers, while improving panic-safety and adding regression coverage.

Changes:

  • Fix cat rowskey --no-headers two-pass bug by storing per-file synthetic headers from pass 1 and reusing them in pass 2.
  • Improve cat_rowskey throughput by precomputing a global→per-file column index map (Vec<Option<usize>>) to avoid per-cell hashmap probes.
  • Add regression tests for wider-first / narrower-first --no-headers cases and --group-name collision warnings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/cmd/cat.rs Implements the --no-headers fix, removes dead stdin code, adds collision warnings, improves panic-safety, and speeds up the rowskey hot loop.
tests/test_cat.rs Adds regression tests covering the fixed --no-headers behavior and group-name collision warning behavior.

Comment thread src/cmd/cat.rs
Comment thread src/cmd/cat.rs
Comment thread tests/test_cat.rs Outdated
…cution

- All three wwarn! sites in cat_rowskey were formatting `conf.path`
  (Option<PathBuf>) with `{:?}`, rendering as `Some("...")` in user-facing
  stderr. Switched to `Display` formatting with a `<stdin>` fallback for
  the first-pass warnings; the second-pass duplicate-column warning uses
  the already-unwrapped `conf_pathbuf.display()`.
- cat_rowskey_group_name_collision used to invoke the command twice
  (once via output_stderr, once via read_stdout), which doubled runtime
  and meant stderr/stdout assertions came from different processes. Now a
  single `wrk.output()` call feeds both assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit d983ac1 into master Apr 26, 2026
16 checks passed
@jqnatividad jqnatividad deleted the cat-codereview-202604 branch April 26, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants