Skip to content

perf: Coalesce page fetches when RowSelection selects all rows#9578

Merged
Dandandan merged 1 commit into
apache:mainfrom
Dandandan:skip_rowselection
Mar 19, 2026
Merged

perf: Coalesce page fetches when RowSelection selects all rows#9578
Dandandan merged 1 commit into
apache:mainfrom
Dandandan:skip_rowselection

Conversation

@Dandandan

Copy link
Copy Markdown
Contributor

Summary

  • When a RowSelection selects every row in a row group, fetch_ranges now treats it as no selection, producing a single whole-column-chunk I/O request instead of N individual page requests
  • This reduces the number of I/O requests for subsequent filter predicates when an earlier predicate passes all rows

Details

In InMemoryRowGroup::fetch_ranges, when both a RowSelection and an OffsetIndex are present, the code enters a page-level fetch path that uses scan_ranges() to produce individual page ranges. Even when the selection covers all rows, this produces N separate ranges (one per page).

The fix: before entering the page-level path, check if the selection's row_count() equals the row group's total row count. If so, drop the selection and take the simpler whole-column-chunk path.

This commonly happens when a multi-predicate RowFilter has an early predicate that passes all rows in a row group (e.g., CounterID = 62 on a row group where all rows have CounterID = 62).

Test plan

  • Existing tests pass (snapshot updated to reflect fewer I/O requests)
  • test_read_multiple_row_filter verifies the coalesced fetch pattern

🤖 Generated with Claude Code

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Mar 19, 2026
@Dandandan

Copy link
Copy Markdown
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4088486771-451-rtrkn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing skip_rowselection (f041c85) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@Dandandan

Copy link
Copy Markdown
Contributor Author

Probably won't show up in benchmark, but still would be nice.
A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

@Dandandan Dandandan force-pushed the skip_rowselection branch 2 times, most recently from 08fbe2c to 97bc46b Compare March 19, 2026 08:19
@Dandandan Dandandan marked this pull request as ready for review March 19, 2026 08:21
When a predicate selects all rows and there is no prior selection,
skip creating a RowSelection entirely. This avoids the allocation in
RowSelection::from_filters and keeps the selection as None, which
enables coalesced page fetches for subsequent predicates.

Merging adjacent ranges in scan_ranges is not feasible because the
Sparse column chunk format requires individual page offsets for lookup.
Instead, this avoids entering the per-page path altogether.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   skip_rowselection
-----                                             ----                                   -----------------
arrow_reader_clickbench/async/Q1                  1.00   1090.7±5.23µs        ? ?/sec    1.00   1096.2±4.57µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.7±0.14ms        ? ?/sec    1.00      6.7±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.6±0.04ms        ? ?/sec    1.02      7.8±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.4±0.20ms        ? ?/sec    1.00     14.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.9±0.12ms        ? ?/sec    1.01     17.1±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.8±0.18ms        ? ?/sec    1.01     16.0±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.0±0.03ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     75.9±4.40ms        ? ?/sec    1.10    83.3±10.12ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00     96.2±0.73ms        ? ?/sec    1.09    104.5±1.94ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    128.3±5.83ms        ? ?/sec    1.03    131.7±5.07ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.01    241.7±4.19ms        ? ?/sec    1.00    239.2±1.78ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.3±0.31ms        ? ?/sec    1.01     19.4±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.01     57.3±0.49ms        ? ?/sec    1.00     56.9±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     57.3±0.55ms        ? ?/sec    1.00     57.0±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.6±0.11ms        ? ?/sec    1.00     18.6±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.02     15.2±0.38ms        ? ?/sec    1.00     14.8±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.01      5.5±0.04ms        ? ?/sec    1.00      5.5±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.06     13.6±0.30ms        ? ?/sec    1.00     12.9±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.06     25.0±0.57ms        ? ?/sec    1.00     23.6±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.8±0.10ms        ? ?/sec    1.00      5.8±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.02ms        ? ?/sec    1.02      5.1±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.6±0.04ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1059.6±6.15µs        ? ?/sec    1.02   1077.8±6.00µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.01      6.5±0.03ms        ? ?/sec    1.00      6.5±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.7±0.17ms        ? ?/sec    1.01      7.8±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.3±0.19ms        ? ?/sec    1.01     14.5±0.25ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.6±0.05ms        ? ?/sec    1.00     16.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.6±0.04ms        ? ?/sec    1.03     16.1±0.34ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      2.9±0.04ms        ? ?/sec    1.01      2.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.01     70.8±0.46ms        ? ?/sec    1.00     70.3±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.4±0.60ms        ? ?/sec    1.00     79.4±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     96.0±0.64ms        ? ?/sec    1.00     95.9±0.49ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.10    228.9±2.13ms        ? ?/sec    1.00    208.5±1.57ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     18.9±0.12ms        ? ?/sec    1.01     19.0±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.01     56.3±0.56ms        ? ?/sec    1.00     55.9±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.6±0.66ms        ? ?/sec    1.00     56.5±0.53ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.1±0.06ms        ? ?/sec    1.01     18.2±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.02     14.7±0.46ms        ? ?/sec    1.00     14.4±0.48ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.4±0.06ms        ? ?/sec    1.00      5.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.03     12.5±0.16ms        ? ?/sec    1.00     12.1±0.13ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.04     23.6±0.51ms        ? ?/sec    1.00     22.7±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.4±0.03ms        ? ?/sec    1.01      5.5±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.01      4.9±0.06ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.01ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00   871.2±12.90µs        ? ?/sec    1.00    872.0±3.15µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.2±0.05ms        ? ?/sec    1.00      5.1±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.1±0.05ms        ? ?/sec    1.00      6.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.01     21.8±0.61ms        ? ?/sec    1.00     21.5±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.8±0.99ms        ? ?/sec    1.05     30.2±0.69ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     23.2±0.23ms        ? ?/sec    1.17     27.1±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.01      2.7±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    123.5±0.83ms        ? ?/sec    1.00    120.0±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.02     97.0±0.17ms        ? ?/sec    1.00     95.5±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.00    142.4±1.30ms        ? ?/sec    1.00    142.6±2.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   277.4±14.45ms        ? ?/sec    1.00   278.6±14.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     26.7±0.08ms        ? ?/sec    1.00     26.7±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.02    107.0±0.70ms        ? ?/sec    1.00    105.2±0.58ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.01    104.2±0.58ms        ? ?/sec    1.00    102.7±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.01     18.9±0.18ms        ? ?/sec    1.00     18.7±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.1±0.30ms        ? ?/sec    1.00     22.2±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.03ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.2±0.09ms        ? ?/sec    1.00     11.2±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.01     20.9±0.33ms        ? ?/sec    1.00     20.7±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.02      5.2±0.08ms        ? ?/sec    1.00      5.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.7±0.07ms        ? ?/sec    1.00      5.6±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.3±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 781.9s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 709.0s
CPU sys 72.6s
Disk read 0 B
Disk write 745.8 MiB

branch

Metric Value
Wall time 779.2s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 710.4s
CPU sys 68.6s
Disk read 0 B
Disk write 171.3 MiB

@alamb

alamb commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Probably won't show up in benchmark, but still would be nice. A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

Should I spend time reviewing that one instead / in addition? I am losing track of all the PRs

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good improvement to me

};
}

// If the predicate selected all rows and there is no prior selection,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan Dandandan merged commit 44f5dfc into apache:main Mar 19, 2026
16 checks passed
@Dandandan

Copy link
Copy Markdown
Contributor Author

Probably won't show up in benchmark, but still would be nice. A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

Should I spend time reviewing that one instead / in addition? I am losing track of all the PRs

Probably won't show up in benchmark, but still would be nice. A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

Should I spend time reviewing that one instead / in addition? I am losing track of all the PRs

Perhaps I/we want to first test it in DataFusion? @adriangb also wanted to do aptive filtering which could be similar or work together.

@adriangb

Copy link
Copy Markdown
Contributor

Probably won't show up in benchmark, but still would be nice. A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

Should I spend time reviewing that one instead / in addition? I am losing track of all the PRs

Probably won't show up in benchmark, but still would be nice. A more general solution would be https://github.com/apache/arrow-rs/pull/9414/changes perhaps

Should I spend time reviewing that one instead / in addition? I am losing track of all the PRs

Perhaps I/we want to first test it in DataFusion? @adriangb also wanted to do aptive filtering which could be similar or work together.

My intuition (substantiated by some now somewhat outdated benchmarks) is that with morselization and using morsels as the unit of adaptivity we can and probably should do all of the adaptive filter execution in DataFusion. I plan to fast follow the morselization work with the adaptive filtering work because (1) it will work much better with the smaller units of adaptivity, (2) I don't want to bog reviewers down with too many options but (3) I need to make sure that any API changes needed (I expect few to none) can be done before the API changes from morsels are set in stone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants