exclude: review-driven cleanup, stdin support and memcheck#3749
Merged
Conversation
Strip dead random-access index machinery from ValueIndex (idx, num_rows, row-position byte buffer, Indexed::open, header-vs-no-headers seek branch, the unused Vec<usize> row tracker, and byteorder/Indexed deps). Only the value lookup is ever consumed, so collapse to a HashSet<Vec<ByteString>>. Switch both inputs from reader_file() to reader_file_stdin() so either input can be `-`. Fail if both are stdin. Add --memcheck flag and call util::mem_file_check on input2 (which is fully loaded into memory), matching the dedup convention. Polish: named VALUE_SET_INITIAL_CAPACITY const, get_row_key helper used at both sites, read_byte_record loop replaces the unused_assignments pattern, match collapsed to `if matched == invert`. Tests: add column-range, tab delimiter, stdin, both-stdin error, and mismatched-column-count coverage (6 -> 11 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 14 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR continues the review-driven cleanup sweep by refactoring the exclude subcommand to simplify its internal matching/indexing approach, add stdin support, and introduce an optional memory guard for the in-memory side of the operation.
Changes:
- Removed the unused random-access/indexing machinery in
exclude, replacing it with a straightforward in-memory value set for<input2>. - Added stdin support (
-) for either<input1>or<input2>(but not both), plus a--memcheckguard for<input2>when it is a file. - Expanded
excludetest coverage for column ranges, tab delimiters, stdin usage, both-stdin erroring, and mismatched selection sizes; regenerated help markdown.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/cmd/exclude.rs |
Simplifies exclusion logic to a HashSet-based value set, adds stdin support and a --memcheck option, and updates CLI help text. |
tests/test_exclude.rs |
Adds regression/feature tests for column-range selection, tab delimiter handling, stdin input, and new error cases. |
docs/help/exclude.md |
Regenerates user-facing help to document stdin usage and the new --memcheck flag. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Continues the recent sweep of review-driven cleanups on individual qsv subcommands (slice #3748, fetch/fetchpost #3747, schema #3746, frequency #3745).
ValueIndexmachinery. The random-access row-position index (idx,num_rows, byte-offset buffer,Indexed::open, the header-vs-no-headers seek branch, the unusedVec<usize>row tracker) was decorated#[allow(dead_code)]and never read byexclude. Collapsed to a plainHashSet<Vec<ByteString>>and dropped thebyteorder/crate::index::Indexeddeps from this file. Also removes the only.unwrap()in the hot path (row.position().unwrap()).reader_file()toreader_file_stdin()so either<input1>or<input2>can be-. Errors out cleanly if both are stdin.--memcheckflag.<input2>is fully loaded into memory; added a--memcheckflag andutil::mem_file_checkguard, matching the dedup convention.VALUE_SET_INITIAL_CAPACITYconst,get_row_keyhelper used at both sites,read_byte_recordloop replaces the#[allow(unused_assignments)]pattern, match block collapsed toif matched == invert.Test plan
cargo build --locked --bin qsv -F all_featurescargo t exclude -F all_features(11/11 pass)cargo clippy --bin qsv -F all_features -- -D warningsqsv --generate-help-mdregenerateddocs/help/exclude.mdprintf 'id\n1\n2\n3\n' | qsv exclude id - id <(printf 'id\n2\n')prints id=1,3🤖 Generated with Claude Code