transpose: review-driven correctness, perf cleanup, and polish#3781
Merged
Conversation
- Drop `mmap.populate()` in --multipass: prefaulting the whole file contradicted the option's low-memory purpose. Lazy paging is what we want for streaming. - Hoist `rconfig` outside the multipass per-column loop and rewrite the misleading "reduce I/O" comment (the win is page-cache locality, not reduced I/O). - Eliminate the second file open in --select: read column names from the first record of the data reader instead of opening a separate no_headers(false) reader. `get_selected_indices()` is gone, replaced by a small `parse_select(&headers)` helper shared by all three modes. - Rename the misnamed `nrows` to `ncols` in transpose paths (it holds input column count, which becomes row count *after* transpose). - Reuse a single `Vec<u8>` buffer in `wide_to_long` for the multi-field concatenation case, and borrow the slice directly in the single-field case so we don't allocate per row. - Replace `unreachable!` with `fail_incorrectusage_clierror!` so a refactor that drops the docopt requirement can't panic. - Unify the two different `--long`/`--select` error prefixes; updated `transpose_long_format_no_columns_selected` to match the new prefix. - Add tests: stdin `--long` and mixed empty/non-empty `--long` values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | -9 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refines qsv transpose with a focus on correctness and performance, particularly around --multipass, --select, and --long behavior.
Changes:
- Removes eager
mmap.populate()in--multipassand reduces per-pass setup work to better align with low-memory goals. - Eliminates an extra reader/open by resolving
--selectagainst the already-available header record, using a sharedparse_selecthelper. - Improves
--longperformance (buffer reuse / fewer allocations) and adds/updates integration tests for stdin and empty-value handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/cmd/transpose.rs |
Refactors selection resolution, removes populate(), hoists config/setup, and reduces allocations in --long. |
tests/test_transpose.rs |
Updates expected error prefix and adds new tests for --long stdin and mixed empty values. |
- Document the `unsafe MmapOptions::map` invariants explicitly: file binding lives the whole function, only read access via `&mmap[..]`, no mutation/truncation by this command, plus the standard caveat about external processes. - Replace the detached writer thread in `transpose_long_format_stdin` with a synchronous `write_all` + `drop(stdin)`. The payload is small enough to fit in the pipe buffer, so the spawn pattern was unnecessary and would panic on BrokenPipe if the child exited early. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Code review of
src/cmd/transpose.rssurfaced one correctness/semantics bug, several perf improvements, and a few clarity wins. All changes land in one PR per review.Higher-severity:
mmap.populate()in--multipass—populate()prefaults every page of the input into RAM, which directly contradicts--multipass's purpose (low memory for big files). Lazy paging is what we want.rconfigout of the per-column multipass loop and rewrite the misleading "reduce I/O" comment (the actual win is page-cache locality across passes).no_headers(false)reader that was used purely to fetch column names for--select. The first record of the data reader already holds the column names; use it directly.get_selected_indices()is removed, replaced by a smallparse_select(&headers)helper shared across all three modes (wide_to_long,in_memory_transpose,multipass_transpose_streaming).nrows→ncolsin transpose paths — the variable holds the input column count (which becomes the row count after transpose). Confusing inside the pre-transpose reader scope.Medium:
Vec<u8>field buffer inwide_to_longfor the multi-field concatenation case; for the single-field case, borrow the slice directly without copying. Was allocating per-row before.--long/--selectparse failures (was mixing two different prefixes).unreachable!withfail_incorrectusage_clierror!inwide_to_longso a future refactor that drops the docopt<selection>requirement won't panic.Tests:
transpose_long_format_stdin—--longworks with stdin input (was untested).transpose_long_format_mixed_empty_values— verifies empty-value-skipping is selective (per-cell, not per-row).transpose_long_format_no_columns_selectedexpected error string to match the unified prefix.tests/test_transpose.rs: 32 tests, all pass.Test plan
cargo build --locked --bin qsv -F all_featurescargo t test_transpose -F all_features— 32 passed, 0 failedcargo clippy --bin qsv -F all_features -- -D warnings— cleanqsv transpose,qsv transpose --multipass,qsv transpose --select 2-3onresources/test/adur-public-toilets-valid.csv— output identical across modesprintf 'a,b,c\n1,2,3\n4,5,6\n' | qsv transpose --long 1— produces correct long-format output🤖 Generated with Claude Code