Warehouse in /tmp/tmpy69j5uf6
Traceback (most recent call last):
File "/home/lidavidm/Code/repro.py", line 80, in <module>
print(scan.to_arrow())
^^^^^^^^^^^^^^^
File "/home/lidavidm/Code/venv/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 1763, in to_arrow
).to_table(self.plan_files())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lidavidm/Code/venv/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 1575, in to_table
if table_result := future.result():
^^^^^^^^^^^^^^^
File "/home/lidavidm/miniforge3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/lidavidm/miniforge3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/lidavidm/miniforge3/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lidavidm/Code/venv/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 1556, in _table_from_scan_task
batches = list(self._record_batches_from_scan_tasks_and_deletes([task], deletes_per_file))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lidavidm/Code/venv/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 1637, in _record_batches_from_scan_tasks_and_deletes
for batch in batches:
^^^^^^^
File "/home/lidavidm/Code/venv/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 1441, in _task_to_record_batches
result_batch = result_batch.set_column(index, name, [value])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 2969, in pyarrow.lib.RecordBatch.set_column
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Added column's length must match record batch's length. Expected length 3 but got length 1
Apache Iceberg version
main (development)
Please describe the bug 🐞
This happens on pyiceberg 0.9.0
iceberg-python/pyiceberg/io/pyarrow.py
Line 1441 in e3a5c3b
This set_column call always tries to add a 1-row column. But this is wrong (and PyArrow rejects it), it needs to add a column with the same length as the rest of the columns.
Reproducer
Output
venv
Willingness to contribute