Skip to content

exclude: review-driven cleanup, stdin support and memcheck#3749

Merged
jqnatividad merged 2 commits into
masterfrom
exclude-codereview-202604
Apr 26, 2026
Merged

exclude: review-driven cleanup, stdin support and memcheck#3749
jqnatividad merged 2 commits into
masterfrom
exclude-codereview-202604

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

Continues the recent sweep of review-driven cleanups on individual qsv subcommands (slice #3748, fetch/fetchpost #3747, schema #3746, frequency #3745).

  • Strip dead ValueIndex machinery. The random-access row-position index (idx, num_rows, byte-offset buffer, Indexed::open, the header-vs-no-headers seek branch, the unused Vec<usize> row tracker) was decorated #[allow(dead_code)] and never read by exclude. Collapsed to a plain HashSet<Vec<ByteString>> and dropped the byteorder/crate::index::Indexed deps from this file. Also removes the only .unwrap() in the hot path (row.position().unwrap()).
  • Stdin support. Switched both inputs from reader_file() to reader_file_stdin() so either <input1> or <input2> can be -. Errors out cleanly if both are stdin.
  • --memcheck flag. <input2> is fully loaded into memory; added a --memcheck flag and util::mem_file_check guard, matching the dedup convention.
  • Polish. Named VALUE_SET_INITIAL_CAPACITY const, get_row_key helper used at both sites, read_byte_record loop replaces the #[allow(unused_assignments)] pattern, match block collapsed to if matched == invert.
  • Tests. Added column-range, tab delimiter, stdin, both-stdin error, and mismatched-column-count coverage (6 → 11 tests).

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo t exclude -F all_features (11/11 pass)
  • cargo clippy --bin qsv -F all_features -- -D warnings
  • qsv --generate-help-md regenerated docs/help/exclude.md
  • Manual: printf 'id\n1\n2\n3\n' | qsv exclude id - id <(printf 'id\n2\n') prints id=1,3

🤖 Generated with Claude Code

Strip dead random-access index machinery from ValueIndex (idx, num_rows,
row-position byte buffer, Indexed::open, header-vs-no-headers seek branch,
the unused Vec<usize> row tracker, and byteorder/Indexed deps). Only the
value lookup is ever consumed, so collapse to a HashSet<Vec<ByteString>>.

Switch both inputs from reader_file() to reader_file_stdin() so either
input can be `-`. Fail if both are stdin. Add --memcheck flag and call
util::mem_file_check on input2 (which is fully loaded into memory),
matching the dedup convention.

Polish: named VALUE_SET_INITIAL_CAPACITY const, get_row_key helper used
at both sites, read_byte_record loop replaces the unused_assignments
pattern, match collapsed to `if matched == invert`.

Tests: add column-range, tab delimiter, stdin, both-stdin error, and
mismatched-column-count coverage (6 -> 11 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production

codacy-production Bot commented Apr 26, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 14 complexity

Metric Results
Complexity 14

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the review-driven cleanup sweep by refactoring the exclude subcommand to simplify its internal matching/indexing approach, add stdin support, and introduce an optional memory guard for the in-memory side of the operation.

Changes:

  • Removed the unused random-access/indexing machinery in exclude, replacing it with a straightforward in-memory value set for <input2>.
  • Added stdin support (-) for either <input1> or <input2> (but not both), plus a --memcheck guard for <input2> when it is a file.
  • Expanded exclude test coverage for column ranges, tab delimiters, stdin usage, both-stdin erroring, and mismatched selection sizes; regenerated help markdown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/cmd/exclude.rs Simplifies exclusion logic to a HashSet-based value set, adds stdin support and a --memcheck option, and updates CLI help text.
tests/test_exclude.rs Adds regression/feature tests for column-range selection, tab delimiter handling, stdin input, and new error cases.
docs/help/exclude.md Regenerates user-facing help to document stdin usage and the new --memcheck flag.

Comment thread src/cmd/exclude.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad merged commit e1a09da into master Apr 26, 2026
13 of 16 checks passed
@jqnatividad jqnatividad deleted the exclude-codereview-202604 branch April 26, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants