Feature Request / Improvement
Support partitioned writes
So I think we want to tackle the static overwrite first, and then we can compute the predicate for the dynamic overwrite to support that. We can come up with a separate API. I haven't really thought this trough, and we can still change this.
I think the most important steps are the breakdown of the work. There is a lot involved, but luckily we already get the test suite from the full overwrite.
Steps I can see:
Other things on my mind:
- In Iceberg it can be that some files are still on an older partitioning, we should make sure that we handle those correctly based on the that we provide.
- How to handle delete files; it might be that the delete files become unrelated because the affected datafiles are replaced. We could first ignore this.
The good part:
- In PyIceberg we're first going to ignore the fast-appends (this is when you create a new manifest, and add it to the manifest list). Instead we'll just take the existing manifest(s) and rewrite it into a single new manifest which makes it a bit easier to reason about the snapshot (and therefore the snapshot summaries). The reason is that this caused quite a few bugs in Java, and it can be added always on a later moment.
Feature Request / Improvement
Support partitioned writes
So I think we want to tackle the static overwrite first, and then we can compute the predicate for the dynamic overwrite to support that. We can come up with a separate API. I haven't really thought this trough, and we can still change this.
I think the most important steps are the breakdown of the work. There is a lot involved, but luckily we already get the test suite from the full overwrite.
Steps I can see:
Other things on my mind:
The good part: