Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.
During review, one thing we noticed is that the code that is used to convert a RowSelection to a Mask could likely be optimized more
|
fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer { |
|
let total_rows: usize = selectors.iter().map(|s| s.row_count).sum(); |
|
let mut builder = BooleanBufferBuilder::new(total_rows); |
|
for selector in selectors { |
|
builder.append_n(selector.row_count, !selector.skip); |
|
} |
|
builder.finish() |
|
} |
Describe the solution you'd like
Make predicate evaluation faster by optimizing the conversion to mask
Describe alternatives you've considered
@hhhizzz mentions on #8733 (comment)
I'll learn the #6624 (comment) and try to improve it in the following PRs.
Aka that @XiangpengHao 's PR here has many tricks to use
Additional context
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After the great work from @hhhizzz in #8733, we will (finally) have the ability to use a Bitmask filter representation when applying filters during Parquet decode.
During review, one thing we noticed is that the code that is used to convert a RowSelection to a Mask could likely be optimized more
arrow-rs/parquet/src/arrow/arrow_reader/selection.rs
Lines 926 to 933 in 911331a
Describe the solution you'd like
Make predicate evaluation faster by optimizing the conversion to mask
Describe alternatives you've considered
@hhhizzz mentions on #8733 (comment)
Aka that @XiangpengHao 's PR here has many tricks to use
Additional context