Skip to content

refactor(stats): shrink and tidy WhichStats#3822

Merged
jqnatividad merged 2 commits into
masterfrom
microoptimize-stats-whichstats-struct
May 5, 2026
Merged

refactor(stats): shrink and tidy WhichStats#3822
jqnatividad merged 2 commits into
masterfrom
microoptimize-stats-whichstats-struct

Conversation

@jqnatividad

Copy link
Copy Markdown
Collaborator

Summary

  • Change WhichStats::percentile_list from String to Box<str>. The field is set once from the CLI flag and never mutated, so the unused capacity word is wasted. Shrinks WhichStats from 40 → 32 bytes and is serde-transparent (no change to the on-disk <file>.stats.csv.json cache format; reading old caches still works).
  • Move use_weights above percentile_list so all 12 bool fields form a contiguous block with the variable-size Box<str> at the end. Source order has no runtime effect (WhichStats is repr(Rust)); purely a readability nit.

Performance note

This is not a perf win — framed as refactor/style for that reason. Benchmarked on a 514 MB / 1M-row NYC_311_SR_2010-2020-sample-1M.csv (M4 Max, 10 runs each, --force to bypass cache):

Workload Before After
stats 1.416 s ± 0.007 1.422 s ± 0.005
stats --everything 2.181 s ± 0.015 2.199 s ± 0.008

Both deltas are inside measurement noise. The theoretical cache-line argument (sub-cache-line spillover from which: WhichStats into Stats' second cache line) doesn't materialize, likely because Stats is #[repr(C, align(64))] and the 8 saved bytes get absorbed into existing tail padding, plus the bool reads are extremely well-predicted per column.

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo clippy -F all_features --bin qsv -- -D warnings
  • cargo t stats -F all_features — all 728 stats tests pass
  • Verified stats cache JSON file remains structurally compatible (serde matches by name on read)

🤖 Generated with Claude Code

jqnatividad and others added 2 commits May 5, 2026 06:49
The percentile list is set once from the CLI flag and never mutated
afterward, so a heap-allocated owned String is overkill. Box<str> drops
the unused capacity word, shrinking WhichStats from 40 to 32 bytes
while remaining serde-transparent (no change to the on-disk
.stats.csv.json cache format). Benchmarked on a 514 MB / 1M-row CSV
with both `stats` and `stats --everything`; runtime is unchanged within
measurement noise (this is a cleanup, not a perf win).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move use_weights above percentile_list so all 12 bool fields form a
contiguous block with the variable-size Box<str> at the end. Source
order has no runtime effect (WhichStats is repr(Rust) and the compiler
already picks an optimal layout); this is purely a readability nit and
mirrors the change in Args::which_stats so the constructor matches the
struct definition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@jqnatividad jqnatividad merged commit 6840fb8 into master May 5, 2026
15 of 16 checks passed
@jqnatividad jqnatividad deleted the microoptimize-stats-whichstats-struct branch May 5, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant