Skip to content

Upsert with list type not supported #1711

@Fokko

Description

@Fokko

Apache Iceberg version

None

Please describe the bug 🐞

See:

➜  iceberg-python git:(fd-align-codestyle) ipython                                    
Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.31.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pyarrow as pa
   ...: 
   ...: arrow_schema = pa.schema(
   ...:     [
   ...:         pa.field("city", pa.string(), nullable=False),
   ...:         pa.field("tags", pa.list_(pa.string()), nullable=False),
   ...:     ]
   ...: )
   ...: 
   ...: # Write some data
   ...: df = pa.Table.from_pylist(
   ...:     [
   ...:         {"city": "Amsterdam", "tags": ["Europe", "Capital"]},
   ...:         {"city": "San Francisco", "tags": ["Amsterdam", "Golden Gate"]},
   ...:     ],
   ...:     schema=arrow_schema,
   ...: )
   ...: joined = df.join(df, "city", join_type="inner")
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[1], line 18
     10 # Write some data
     11 df = pa.Table.from_pylist(
     12     [
     13         {"city": "Amsterdam", "tags": ["Europe", "Capital"]},
   (...)
     16     schema=arrow_schema,
     17 )
---> 18 joined = df.join(df, "city", join_type="inner")

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/table.pxi:5704, in pyarrow.lib.Table.join()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/acero.py:249, in _perform_join(join_type, left_operand, left_keys, right_operand, right_keys, left_suffix, right_suffix, use_threads, coalesce_keys, output_type)
    244     projection = Declaration(
    245         "project", ProjectNodeOptions(projections, projected_col_names)
    246     )
    247     decl = Declaration.from_sequence([decl, projection])
--> 249 result_table = decl.to_table(use_threads=use_threads)
    251 if output_type == Table:
    252     return result_table

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/_acero.pyx:590, in pyarrow._acero.Declaration.to_table()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowInvalid: Data type list<item: string> is not supported in join non-key field tags

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions