Skip to content

transpose: review-driven correctness, perf cleanup, and polish#3781

Merged
jqnatividad merged 2 commits into
masterfrom
transpose-review-fixes
Apr 29, 2026
Merged

transpose: review-driven correctness, perf cleanup, and polish#3781
jqnatividad merged 2 commits into
masterfrom
transpose-review-fixes

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

Code review of src/cmd/transpose.rs surfaced one correctness/semantics bug, several perf improvements, and a few clarity wins. All changes land in one PR per review.

Higher-severity:

  • Drop mmap.populate() in --multipasspopulate() prefaults every page of the input into RAM, which directly contradicts --multipass's purpose (low memory for big files). Lazy paging is what we want.
  • Hoist rconfig out of the per-column multipass loop and rewrite the misleading "reduce I/O" comment (the actual win is page-cache locality across passes).
  • Single file open per invocation — eliminate the second no_headers(false) reader that was used purely to fetch column names for --select. The first record of the data reader already holds the column names; use it directly. get_selected_indices() is removed, replaced by a small parse_select(&headers) helper shared across all three modes (wide_to_long, in_memory_transpose, multipass_transpose_streaming).
  • Rename nrowsncols in transpose paths — the variable holds the input column count (which becomes the row count after transpose). Confusing inside the pre-transpose reader scope.

Medium:

  • Reuse Vec<u8> field buffer in wide_to_long for the multi-field concatenation case; for the single-field case, borrow the slice directly without copying. Was allocating per-row before.
  • Unify error wrapping for --long / --select parse failures (was mixing two different prefixes).
  • Replace unreachable! with fail_incorrectusage_clierror! in wide_to_long so a future refactor that drops the docopt <selection> requirement won't panic.

Tests:

  • Add transpose_long_format_stdin--long works with stdin input (was untested).
  • Add transpose_long_format_mixed_empty_values — verifies empty-value-skipping is selective (per-cell, not per-row).
  • Update transpose_long_format_no_columns_selected expected error string to match the unified prefix.

tests/test_transpose.rs: 32 tests, all pass.

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo t test_transpose -F all_features — 32 passed, 0 failed
  • cargo clippy --bin qsv -F all_features -- -D warnings — clean
  • Smoke: qsv transpose, qsv transpose --multipass, qsv transpose --select 2-3 on resources/test/adur-public-toilets-valid.csv — output identical across modes
  • Smoke: printf 'a,b,c\n1,2,3\n4,5,6\n' | qsv transpose --long 1 — produces correct long-format output

🤖 Generated with Claude Code

- Drop `mmap.populate()` in --multipass: prefaulting the whole file
  contradicted the option's low-memory purpose. Lazy paging is what we
  want for streaming.
- Hoist `rconfig` outside the multipass per-column loop and rewrite the
  misleading "reduce I/O" comment (the win is page-cache locality, not
  reduced I/O).
- Eliminate the second file open in --select: read column names from the
  first record of the data reader instead of opening a separate
  no_headers(false) reader. `get_selected_indices()` is gone, replaced
  by a small `parse_select(&headers)` helper shared by all three modes.
- Rename the misnamed `nrows` to `ncols` in transpose paths (it holds
  input column count, which becomes row count *after* transpose).
- Reuse a single `Vec<u8>` buffer in `wide_to_long` for the multi-field
  concatenation case, and borrow the slice directly in the
  single-field case so we don't allocate per row.
- Replace `unreachable!` with `fail_incorrectusage_clierror!` so a
  refactor that drops the docopt requirement can't panic.
- Unify the two different `--long`/`--select` error prefixes; updated
  `transpose_long_format_no_columns_selected` to match the new prefix.
- Add tests: stdin `--long` and mixed empty/non-empty `--long` values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production

codacy-production Bot commented Apr 29, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -9 complexity

Metric Results
Complexity -9

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines qsv transpose with a focus on correctness and performance, particularly around --multipass, --select, and --long behavior.

Changes:

  • Removes eager mmap.populate() in --multipass and reduces per-pass setup work to better align with low-memory goals.
  • Eliminates an extra reader/open by resolving --select against the already-available header record, using a shared parse_select helper.
  • Improves --long performance (buffer reuse / fewer allocations) and adds/updates integration tests for stdin and empty-value handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/cmd/transpose.rs Refactors selection resolution, removes populate(), hoists config/setup, and reduces allocations in --long.
tests/test_transpose.rs Updates expected error prefix and adds new tests for --long stdin and mixed empty values.

Comment thread tests/test_transpose.rs Outdated
Comment thread src/cmd/transpose.rs Outdated
- Document the `unsafe MmapOptions::map` invariants explicitly: file
  binding lives the whole function, only read access via `&mmap[..]`,
  no mutation/truncation by this command, plus the standard caveat
  about external processes.
- Replace the detached writer thread in `transpose_long_format_stdin`
  with a synchronous `write_all` + `drop(stdin)`. The payload is small
  enough to fit in the pipe buffer, so the spawn pattern was unnecessary
  and would panic on BrokenPipe if the child exited early.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit be0adab into master Apr 29, 2026
14 of 17 checks passed
@jqnatividad jqnatividad deleted the transpose-review-fixes branch April 29, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants