Skip to content

replace: fix empty-index header skip, dead match tracking, port streaming parallel write#3777

Merged
jqnatividad merged 1 commit into
masterfrom
replace-review-202604
Apr 28, 2026
Merged

replace: fix empty-index header skip, dead match tracking, port streaming parallel write#3777
jqnatividad merged 1 commit into
masterfrom
replace-review-202604

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

Code review of src/cmd/replace.rs against the recent search/searchset parallel refactor (#3776) surfaced one observable bug, dead code, and an efficiency gap. This PR addresses all of them.

  • Bug fix: parallel_replace skipped headers and ignored --not-one when the indexed file had zero data rows; sequential_replace always wrote headers. Both paths now emit headers up-front and route through handle_replace_results so --not-one semantics match.
  • Bug fix: worker threads no longer .unwrap() on indexed() / seek() / byte_records() errors — CliResult is propagated through a closure, matching search.rs:582-810.
  • Cleanup: drop the dead had_match field + _rows_with_matches_ctr counter from the parallel path (the sequential progress message now derives it from match_count > 0); make parallel_replace take &Regex (matching sequential_replace); drop a redundant record = processed_record; reassignment, a stale // self.clone(), comment, the #[allow(dead_code)] on Args, and the unqualified use crossbeam_channel;.
  • Streaming parallel write: replace the whole-file Vec<Vec<ReplaceResult>> buffer + sort with the BTreeMap<usize, ChunkOutput> + next_chunk cursor pattern from search.rs. Peak memory is now ~njobs × chunk_size records instead of the entire file. Per-row row_number: u64 is gone — chunks carry chunk_index for ordering; within-chunk order is preserved by push order.

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo test --locked -F all_features replace — 31/31 passing, including existing replace_indexed_parallel (multi-chunk ordering with --jobs 2) and the new replace_indexed_empty regression
  • cargo clippy --locked -F all_features --bin qsv -- -D warnings — clean

🤖 Generated with Claude Code

…ch-flag tracking, streaming parallel write

Brings `qsv replace` into line with the recent `search`/`searchset`
parallel refactor (#3776) and addresses several issues found while
reviewing src/cmd/replace.rs.

Bug fixes
- parallel_replace skipped headers and ignored --not-one when the
  indexed file had zero data rows; sequential_replace always wrote
  headers. Both paths now emit headers up-front and route through
  handle_replace_results so --not-one semantics match.
- Worker threads no longer .unwrap() on indexed()/seek()/byte_records()
  errors; CliResult is propagated via a closure, mirroring search.rs.

Cleanup
- Drop the dead `had_match` field from the per-record tuple and the
  `_rows_with_matches_ctr` counter that was incremented but never read
  in parallel_replace. The sequential progress message now derives
  `had_match` from `match_count > 0`.
- parallel_replace now takes &Regex (matching sequential_replace) and
  clones once into the Arc.
- Drop a redundant `record = processed_record;` reassignment in
  sequential_replace, a stale `// self.clone(),` comment, the
  `#[allow(dead_code)]` on Args, and the unqualified
  `use crossbeam_channel;` import.

Streaming parallel write
- Replace the whole-file `Vec<Vec<ReplaceResult>>` buffer + sort with
  the BTreeMap<usize, ChunkOutput> + next_chunk cursor pattern from
  search.rs. Peak memory is now ~njobs × chunk_size records instead of
  the entire file. Per-row `row_number: u64` is gone (chunks carry
  their `chunk_index` for ordering; within-chunk order is preserved by
  push order).

Tests
- New tests/test_replace.rs::replace_indexed_empty regression test
  covering the empty-indexed-file path with and without --not-one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 7 complexity

Metric Results
Complexity 7

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves qsv replace’s indexed/parallel execution path to match the sequential path’s semantics (especially for empty indexed inputs) and to reduce peak memory usage by streaming parallel chunk outputs in input order.

Changes:

  • Fix parallel replace behavior for indexed files with zero data rows: headers are emitted and --not-one/exit-code semantics match sequential replace.
  • Propagate worker I/O/index/CSV errors via CliResult instead of panicking, and remove dead match-tracking state in the parallel path.
  • Switch parallel output from “buffer everything then sort” to a streaming BTreeMap + next_chunk ordering pattern (ported from search).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/cmd/replace.rs Aligns parallel replace semantics with sequential, removes dead tracking, propagates errors, and streams chunk outputs in order to bound memory.
tests/test_replace.rs Adds a regression test covering the empty-indexed-file behavior (headers + --not-one semantics) under parallel mode.

@jqnatividad jqnatividad merged commit b1a2cdd into master Apr 28, 2026
21 checks passed
@jqnatividad jqnatividad deleted the replace-review-202604 branch April 28, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants