The initial idea came in 2017 while evaluating the parquet specification for a project. In
particular I got to the Delta Encoding description and noted that the format was
<min delta> <list of bitwidths of miniblocks> <miniblocks>. Thinking about how a column
traversal for a scan would occur, the block would be expanded into memory, scanned, and then
tossed. However, there is already enough information in the header to determine if a value
falls between the min and max of the miniblocks in this block — rather than decompressing and
comparing, why not compress the predicate and compare against the compressed data?
I didn't have time to explore the idea at the time, but I continued to be curious whether it'd
been implemented as part of the predicate pushdown work. Recently I explored this with Claude,
giving it a research task to see if the idea had been implemented. Informed that it hadn't,
we worked together on a test project to see if the idea had any merit. Satisfied that it worked,
we then integrated it into the codebase. The API change (Decoder::scan_filtered) is isolated
to the final PR so it can be targeted independently.
This first PR addresses the simplest case: bw=0 miniblocks in the non-terminal skip path.
In DeltaBitPackDecoder::skip(), miniblocks with bit_width=0 currently call
get_batch and iterate over 32 or 64 values even though every delta in the
miniblock equals min_delta exactly (remainders are all zero). No bit reads are
needed; the only state update is advancing last_value by n * min_delta.
Proposed fix: When bit_width == 0 in the non-terminal skip path, replace
the get_batch call with a single wrapping_mul + wrapping_add. When
min_delta == 0 the loop body is a no-op and can be skipped entirely.
Measured improvement (arrow_reader bench, vs upstream HEAD):
- bw=0 single-value skip: -21.6%
- bw=0 increasing-value skip: -24.3%
The initial idea came in 2017 while evaluating the parquet specification for a project. In
particular I got to the Delta Encoding description and noted that the format was
<min delta> <list of bitwidths of miniblocks> <miniblocks>. Thinking about how a columntraversal for a scan would occur, the block would be expanded into memory, scanned, and then
tossed. However, there is already enough information in the header to determine if a value
falls between the min and max of the miniblocks in this block — rather than decompressing and
comparing, why not compress the predicate and compare against the compressed data?
I didn't have time to explore the idea at the time, but I continued to be curious whether it'd
been implemented as part of the predicate pushdown work. Recently I explored this with Claude,
giving it a research task to see if the idea had been implemented. Informed that it hadn't,
we worked together on a test project to see if the idea had any merit. Satisfied that it worked,
we then integrated it into the codebase. The API change (
Decoder::scan_filtered) is isolatedto the final PR so it can be targeted independently.
This first PR addresses the simplest case: bw=0 miniblocks in the non-terminal skip path.
In
DeltaBitPackDecoder::skip(), miniblocks withbit_width=0currently callget_batchand iterate over 32 or 64 values even though every delta in theminiblock equals
min_deltaexactly (remainders are all zero). No bit reads areneeded; the only state update is advancing
last_valuebyn * min_delta.Proposed fix: When
bit_width == 0in the non-terminal skip path, replacethe
get_batchcall with a singlewrapping_mul+wrapping_add. Whenmin_delta == 0the loop body is a no-op and can be skipped entirely.Measured improvement (arrow_reader bench, vs upstream HEAD):