cat: review-driven cleanup, --no-headers fix and rowskey speedup#3750
Merged
Conversation
Fix a --no-headers correctness bug in `cat rowskey` where the second pass reused only the last file's synthetic header for all files, silently truncating trailing columns of any wider file appearing earlier. Per-file synthetic headers are now kept and indexed back in pass 2. Remove dead stdin-handling logic in `cat_rowskey` (process_input already materializes stdin to a temp file before we run, so Config::is_stdin() was never true here). Replace canonicalize/file_name/file_stem unwraps with `?` and graceful fallbacks. Warn when an input column collides with --group-name. Speed up the rowskey hot loop by precomputing a Vec<Option<usize>> mapping from the global column set to each file's columns once per file, replacing a per-cell IndexMap probe (with Box<[u8]> hashing) with a flat Vec walk. Add regression tests for the wider-first --no-headers case and the --group-name collision warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rror test - Emit the --group-name collision warning in the --no-headers branch as well; previously a synthetic header (e.g. `_c_1`) matching --group-name silently overrode the grouping value with no warning. Hoist group_name_bytes to a local while there. - Add cat_rowskey_no_headers_narrowerfirst as a symmetric regression test (narrower first, wider second), pinning the case that used to "work by accident" under the old per-pass shared synthetic header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 9 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes correctness and performance issues in cat rowskey, particularly for --no-headers, while improving panic-safety and adding regression coverage.
Changes:
- Fix
cat rowskey --no-headerstwo-pass bug by storing per-file synthetic headers from pass 1 and reusing them in pass 2. - Improve
cat_rowskeythroughput by precomputing a global→per-file column index map (Vec<Option<usize>>) to avoid per-cell hashmap probes. - Add regression tests for wider-first / narrower-first
--no-headerscases and--group-namecollision warnings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/cmd/cat.rs |
Implements the --no-headers fix, removes dead stdin code, adds collision warnings, improves panic-safety, and speeds up the rowskey hot loop. |
tests/test_cat.rs |
Adds regression tests covering the fixed --no-headers behavior and group-name collision warning behavior. |
…cution
- All three wwarn! sites in cat_rowskey were formatting `conf.path`
(Option<PathBuf>) with `{:?}`, rendering as `Some("...")` in user-facing
stderr. Switched to `Display` formatting with a `<stdin>` fallback for
the first-pass warnings; the second-pass duplicate-column warning uses
the already-unwrapped `conf_pathbuf.display()`.
- cat_rowskey_group_name_collision used to invoke the command twice
(once via output_stderr, once via read_stdout), which doubled runtime
and meant stderr/stdout assertions came from different processes. Now a
single `wrk.output()` call feeds both assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cat rowskey --no-headerscorrectness bug where the second pass reused only the last file's synthetic header for all files, silently truncating trailing columns of any wider file appearing earlier. Per-file synthetic headers are now stored in pass 1 and indexed back in pass 2.cat_rowskey(util::process_inputalready materializes stdin to a temp file before we run, soConfig::is_stdin()was never true here). Drop the now-unusedDEFAULT_WTR_BUFFER_CAPACITYimport.canonicalize().unwrap(),file_name().unwrap(),file_stem().unwrap()andconf_path.unwrap()with?/ graceful fallbacks — consistent with the recentschema/fetch/slicepanic-safety passes.wwarn!) when an input column collides with--group-name, in both the headers and--no-headersbranches (silent override before).cat_rowskeyhot loop: precompute aVec<Option<usize>>mapping from the global column set to each file's columns once per file, replacing a per-cellIndexMapprobe (withBox<[u8]>hashing) with a flatVecwalk.Test plan
cargo build --locked --bin qsv -F all_featurescargo t cat -F all_features— 43 passed (was 42 before adding regression tests)cargo clippy --bin qsv -F all_features --no-deps— clean-, two-file--group fname, and the wider-first--no-headerscase all render correctly--no-headers, mirror narrower-first test) and closedNew regression tests:
cat_rowskey_no_headers_widerfirst— wider file first, narrower second; would have lost columns under the old codecat_rowskey_no_headers_narrowerfirst— symmetric mirror, pins the case that used to "work by accident"cat_rowskey_group_name_collision— asserts the warning fires and the documented "file column wins" behavior is preserved🤖 Generated with Claude Code