Skip to content

Support dynamic overwrite #1287

@kevinjqliu

Description

@kevinjqliu

Feature Request / Improvement

Currently overwrite consists of a delete + append operation.

self.delete(delete_filter=overwrite_filter, snapshot_properties=snapshot_properties)
with self.update_snapshot(snapshot_properties=snapshot_properties).fast_append() as update_snapshot:
# skip writing data files if the dataframe is empty
if df.shape[0] > 0:
data_files = _dataframe_to_data_files(
table_metadata=self.table_metadata, write_uuid=update_snapshot.commit_uuid, df=df, io=self._table.io
)
for data_file in data_files:
update_snapshot.append_data_file(data_file)

As an optimization, we can support dynamic overwrite for when an entire partition is replaced.

Heres an example from @koenvo
https://gist.github.com/koenvo/e23bfab32c7e7810eb52f82c6304fc22

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions