Skip to content

pyarrow 18 regression: ValueError: type(schema)=<class 'pyarrow.lib.Schema'> #1265

@grihabor

Description

@grihabor

Apache Iceberg version

0.7.1 (latest release)

Please describe the bug 🐞

After updating pyarrow from 17.0.0 to 18.0.0, I've got this error:

Traceback (most recent call last):
  File "/home/grihabor/projects/playground/pyiceberg-pyarrow-schema/./run.py", line 38, in <module>
    tbl = catalog.create_table(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/grihabor/projects/playground/pyiceberg-pyarrow-schema/.venv/pyarrow-18/lib/python3.12/site-packages/pyiceberg/catalog/sql.py", line 193, in create_table
    schema: Schema = self._convert_schema_if_needed(schema)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/grihabor/projects/playground/pyiceberg-pyarrow-schema/.venv/pyarrow-18/lib/python3.12/site-packages/pyiceberg/catalog/__init__.py", line 732, in _convert_schema_if_needed
    raise ValueError(f"{type(schema)=}, but it must be pyiceberg.schema.Schema or pyarrow.Schema")
ValueError: type(schema)=<class 'pyarrow.lib.Schema'>, but it must be pyiceberg.schema.Schema or pyarrow.Schema

Steps to reproduce

run.py script:

import shutil
from pyiceberg.catalog.sql import SqlCatalog
from pathlib import Path
import pyarrow as pa

warehouse_path = Path("/tmp/warehouse")
if warehouse_path.exists():
    shutil.rmtree(warehouse_path)

Path(warehouse_path).mkdir()
catalog = SqlCatalog(
    "default",
    **{
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)


df = pa.Table.from_pylist(
    mapping=[
        {"city": "Amsterdam", "lat": 52.371807, "long": 4.896029},
        {"city": "San Francisco", "lat": 37.773972, "long": -122.431297},
        {"city": "Drachten", "lat": 53.11254, "long": 6.0989},
        {"city": "Paris", "lat": 48.864716, "long": 2.349014},
    ],
    schema=pa.schema(
        [
            ("city", pa.large_string()),
            ("lat", pa.float64()),
            ("long", pa.float64()),
        ]
    ),
)


catalog.create_namespace("db")
tbl = catalog.create_table(
    identifier="db.cities",
    schema=df.schema,
)
tbl.overwrite(df)
print(tbl.scan().to_arrow())

This works:

uv venv --python 3.12 .venv/pyarrow-17
source .venv/pyarrow-17/bin/activate
uv pip install pyiceberg pyarrow==17.0.0 sqlalchemy
python ./run.py

This throws the exception:

uv venv --python 3.12 .venv/pyarrow-18
source .venv/pyarrow-18/bin/activate
uv pip install pyiceberg pyarrow==18.0.0 sqlalchemy
python ./run.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions