parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder

The initial idea came in 2017 while evaluating the parquet specification for a project. In
particular I got to the Delta Encoding description and noted that the format was
`<min delta> <list of bitwidths of miniblocks> <miniblocks>`. Thinking about how a column
traversal for a scan would occur, the block would be expanded into memory, scanned, and then
tossed. However, there is already enough information in the header to determine if a value
falls between the min and max of the miniblocks in this block — rather than decompressing and
comparing, why not compress the predicate and compare against the compressed data?

I didn't have time to explore the idea at the time, but I continued to be curious whether it'd
been implemented as part of the predicate pushdown work. Recently I explored this with Claude,
giving it a research task to see if the idea had been implemented. Informed that it hadn't,
we worked together on a test project to see if the idea had any merit. Satisfied that it worked,
we then integrated it into the codebase. The API change (`Decoder::scan_filtered`) is isolated
to the final PR so it can be targeted independently.

This first PR addresses the simplest case: bw=0 miniblocks in the non-terminal skip path.

----

In `DeltaBitPackDecoder::skip()`, miniblocks with `bit_width=0` currently call
`get_batch` and iterate over 32 or 64 values even though every delta in the
miniblock equals `min_delta` exactly (remainders are all zero). No bit reads are
needed; the only state update is advancing `last_value` by `n * min_delta`.

**Proposed fix:** When `bit_width == 0` in the non-terminal skip path, replace
the `get_batch` call with a single `wrapping_mul` + `wrapping_add`. When
`min_delta == 0` the loop body is a no-op and can be skipped entirely.

**Measured improvement (arrow_reader bench, vs upstream HEAD):**
- bw=0 single-value skip: -21.6%
- bw=0 increasing-value skip: -24.3%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9783

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9783

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions