replace: fix empty-index header skip, dead match tracking, port streaming parallel write#3777

jqnatividad · 2026-04-28T17:34:02Z

Summary

Code review of src/cmd/replace.rs against the recent search/searchset parallel refactor (#3776) surfaced one observable bug, dead code, and an efficiency gap. This PR addresses all of them.

Bug fix: parallel_replace skipped headers and ignored --not-one when the indexed file had zero data rows; sequential_replace always wrote headers. Both paths now emit headers up-front and route through handle_replace_results so --not-one semantics match.
Bug fix: worker threads no longer .unwrap() on indexed() / seek() / byte_records() errors — CliResult is propagated through a closure, matching search.rs:582-810.
Cleanup: drop the dead had_match field + _rows_with_matches_ctr counter from the parallel path (the sequential progress message now derives it from match_count > 0); make parallel_replace take &Regex (matching sequential_replace); drop a redundant record = processed_record; reassignment, a stale // self.clone(), comment, the #[allow(dead_code)] on Args, and the unqualified use crossbeam_channel;.
Streaming parallel write: replace the whole-file Vec<Vec<ReplaceResult>> buffer + sort with the BTreeMap<usize, ChunkOutput> + next_chunk cursor pattern from search.rs. Peak memory is now ~njobs × chunk_size records instead of the entire file. Per-row row_number: u64 is gone — chunks carry chunk_index for ordering; within-chunk order is preserved by push order.

Test plan

cargo build --locked --bin qsv -F all_features
cargo test --locked -F all_features replace — 31/31 passing, including existing replace_indexed_parallel (multi-chunk ordering with --jobs 2) and the new replace_indexed_empty regression
cargo clippy --locked -F all_features --bin qsv -- -D warnings — clean

🤖 Generated with Claude Code

…ch-flag tracking, streaming parallel write Brings `qsv replace` into line with the recent `search`/`searchset` parallel refactor (#3776) and addresses several issues found while reviewing src/cmd/replace.rs. Bug fixes - parallel_replace skipped headers and ignored --not-one when the indexed file had zero data rows; sequential_replace always wrote headers. Both paths now emit headers up-front and route through handle_replace_results so --not-one semantics match. - Worker threads no longer .unwrap() on indexed()/seek()/byte_records() errors; CliResult is propagated via a closure, mirroring search.rs. Cleanup - Drop the dead `had_match` field from the per-record tuple and the `_rows_with_matches_ctr` counter that was incremented but never read in parallel_replace. The sequential progress message now derives `had_match` from `match_count > 0`. - parallel_replace now takes &Regex (matching sequential_replace) and clones once into the Arc. - Drop a redundant `record = processed_record;` reassignment in sequential_replace, a stale `// self.clone(),` comment, the `#[allow(dead_code)]` on Args, and the unqualified `use crossbeam_channel;` import. Streaming parallel write - Replace the whole-file `Vec<Vec<ReplaceResult>>` buffer + sort with the BTreeMap<usize, ChunkOutput> + next_chunk cursor pattern from search.rs. Peak memory is now ~njobs × chunk_size records instead of the entire file. Per-row `row_number: u64` is gone (chunks carry their `chunk_index` for ordering; within-chunk order is preserved by push order). Tests - New tests/test_replace.rs::replace_indexed_empty regression test covering the empty-indexed-file path with and without --not-one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codacy-production · 2026-04-28T17:35:14Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 7 complexity

Metric Results

Complexity 7

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

Copilot

Pull request overview

This PR improves qsv replace’s indexed/parallel execution path to match the sequential path’s semantics (especially for empty indexed inputs) and to reduce peak memory usage by streaming parallel chunk outputs in input order.

Changes:

Fix parallel replace behavior for indexed files with zero data rows: headers are emitted and --not-one/exit-code semantics match sequential replace.
Propagate worker I/O/index/CSV errors via CliResult instead of panicking, and remove dead match-tracking state in the parallel path.
Switch parallel output from “buffer everything then sort” to a streaming BTreeMap + next_chunk ordering pattern (ported from search).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`src/cmd/replace.rs`	Aligns parallel replace semantics with sequential, removes dead tracking, propagates errors, and streams chunk outputs in order to bound memory.
`tests/test_replace.rs`	Adds a regression test covering the empty-indexed-file behavior (headers + `--not-one` semantics) under parallel mode.

jqnatividad requested a review from Copilot April 28, 2026 17:35

Copilot started reviewing on behalf of jqnatividad April 28, 2026 17:40 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

jqnatividad merged commit b1a2cdd into master Apr 28, 2026
21 checks passed

jqnatividad deleted the replace-review-202604 branch April 28, 2026 18:15

BrewTestBot mentioned this pull request May 3, 2026

qsv 20.0.0 Homebrew/homebrew-core#280687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace: fix empty-index header skip, dead match tracking, port streaming parallel write#3777