replace: fix empty-index header skip, dead match tracking, port streaming parallel write#3777
Merged
Merged
Conversation
…ch-flag tracking, streaming parallel write Brings `qsv replace` into line with the recent `search`/`searchset` parallel refactor (#3776) and addresses several issues found while reviewing src/cmd/replace.rs. Bug fixes - parallel_replace skipped headers and ignored --not-one when the indexed file had zero data rows; sequential_replace always wrote headers. Both paths now emit headers up-front and route through handle_replace_results so --not-one semantics match. - Worker threads no longer .unwrap() on indexed()/seek()/byte_records() errors; CliResult is propagated via a closure, mirroring search.rs. Cleanup - Drop the dead `had_match` field from the per-record tuple and the `_rows_with_matches_ctr` counter that was incremented but never read in parallel_replace. The sequential progress message now derives `had_match` from `match_count > 0`. - parallel_replace now takes &Regex (matching sequential_replace) and clones once into the Arc. - Drop a redundant `record = processed_record;` reassignment in sequential_replace, a stale `// self.clone(),` comment, the `#[allow(dead_code)]` on Args, and the unqualified `use crossbeam_channel;` import. Streaming parallel write - Replace the whole-file `Vec<Vec<ReplaceResult>>` buffer + sort with the BTreeMap<usize, ChunkOutput> + next_chunk cursor pattern from search.rs. Peak memory is now ~njobs × chunk_size records instead of the entire file. Per-row `row_number: u64` is gone (chunks carry their `chunk_index` for ordering; within-chunk order is preserved by push order). Tests - New tests/test_replace.rs::replace_indexed_empty regression test covering the empty-indexed-file path with and without --not-one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 7 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves qsv replace’s indexed/parallel execution path to match the sequential path’s semantics (especially for empty indexed inputs) and to reduce peak memory usage by streaming parallel chunk outputs in input order.
Changes:
- Fix parallel replace behavior for indexed files with zero data rows: headers are emitted and
--not-one/exit-code semantics match sequential replace. - Propagate worker I/O/index/CSV errors via
CliResultinstead of panicking, and remove dead match-tracking state in the parallel path. - Switch parallel output from “buffer everything then sort” to a streaming
BTreeMap + next_chunkordering pattern (ported fromsearch).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/cmd/replace.rs |
Aligns parallel replace semantics with sequential, removes dead tracking, propagates errors, and streams chunk outputs in order to bound memory. |
tests/test_replace.rs |
Adds a regression test covering the empty-indexed-file behavior (headers + --not-one semantics) under parallel mode. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Code review of
src/cmd/replace.rsagainst the recentsearch/searchsetparallel refactor (#3776) surfaced one observable bug, dead code, and an efficiency gap. This PR addresses all of them.parallel_replaceskipped headers and ignored--not-onewhen the indexed file had zero data rows;sequential_replacealways wrote headers. Both paths now emit headers up-front and route throughhandle_replace_resultsso--not-onesemantics match..unwrap()onindexed()/seek()/byte_records()errors —CliResultis propagated through a closure, matchingsearch.rs:582-810.had_matchfield +_rows_with_matches_ctrcounter from the parallel path (the sequential progress message now derives it frommatch_count > 0); makeparallel_replacetake&Regex(matchingsequential_replace); drop a redundantrecord = processed_record;reassignment, a stale// self.clone(),comment, the#[allow(dead_code)]onArgs, and the unqualifieduse crossbeam_channel;.Vec<Vec<ReplaceResult>>buffer + sort with theBTreeMap<usize, ChunkOutput>+next_chunkcursor pattern fromsearch.rs. Peak memory is now ~njobs × chunk_sizerecords instead of the entire file. Per-rowrow_number: u64is gone — chunks carrychunk_indexfor ordering; within-chunk order is preserved by push order.Test plan
cargo build --locked --bin qsv -F all_featurescargo test --locked -F all_features replace— 31/31 passing, including existingreplace_indexed_parallel(multi-chunk ordering with--jobs 2) and the newreplace_indexed_emptyregressioncargo clippy --locked -F all_features --bin qsv -- -D warnings— clean🤖 Generated with Claude Code