Feature Request / Improvement
Pre-req: Arrow 17 (#929)
Details
pyarrow.dataset.Scanner.from_fragment's filter parameter now takes an Expression type in Arrow 17.
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html#pyarrow.dataset.Scanner.from_fragment
We can try to push down both the user specified filters and the positional deletes.
|
# This will push down the query to Arrow. |
|
# But in case there are positional deletes, we have to apply them first |
|
filter=pyarrow_filter if not positional_deletes else None, |
and
|
if positional_deletes: |
|
# Create the mask of indices that we're interested in |
|
indices = _combine_positional_deletes(positional_deletes, current_index, current_index + len(batch)) |
|
batch = batch.take(indices) |
|
# Apply the user filter |
|
if pyarrow_filter is not None: |
|
# we need to switch back and forth between RecordBatch and Table |
|
# as Expression filter isn't yet supported in RecordBatch |
|
# https://github.com/apache/arrow/issues/39220 |
|
arrow_table = pa.Table.from_batches([batch]) |
|
arrow_table = arrow_table.filter(pyarrow_filter) |
Requires Arrow 17
Feature Request / Improvement
Pre-req: Arrow 17 (#929)
Details
pyarrow.dataset.Scanner.from_fragment'sfilterparameter now takes anExpressiontype in Arrow 17.https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html#pyarrow.dataset.Scanner.from_fragment
We can try to push down both the user specified filters and the positional deletes.
iceberg-python/pyiceberg/io/pyarrow.py
Lines 1232 to 1234 in f05b1ae
and
iceberg-python/pyiceberg/io/pyarrow.py
Lines 1243 to 1253 in f05b1ae
Requires Arrow 17
RecordBatch.filterto take anExpressionin addition to a boolean maskArrayarrow#39220