[feature](iceberg) Support Partition Evolution DDL for Iceberg Tables#57972
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
3c485cb to
7d998d1
Compare
|
run buildall |
zddr
left a comment
There was a problem hiding this comment.
Reference regression-test/suites/mtmv_p0/test_iceberg_mtmv.groovy to add some partition evolution cases to validate the behavior of MTMV.
For example:
- Without changing the partition column, but switching from day to year, should be allowed. The materialized view should refresh normally and generate correct partitions.
- Changing the partition column from c1 to c2 should cause the materialized view refresh to fail.
- Switching from year to identity should cause the materialized view refresh to fail.
|
run buildall |
TPC-H: Total hot run time: 34247 ms |
TPC-DS: Total hot run time: 188470 ms |
ClickBench: Total hot run time: 27.43 s |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 34489 ms |
TPC-DS: Total hot run time: 187845 ms |
ClickBench: Total hot run time: 27.51 s |
FE UT Coverage ReportIncrement line coverage |
57da735 to
26b5aef
Compare
26b5aef to
f2cf736
Compare
|
run buildall |
TPC-H: Total hot run time: 34708 ms |
TPC-DS: Total hot run time: 187048 ms |
There was a problem hiding this comment.
Pull request overview
This PR implements support for partition evolution in Apache Iceberg tables, enabling users to dynamically modify table partition strategies through DDL operations (ADD/DROP/REPLACE PARTITION KEY) without rewriting data files. The implementation follows Spark SQL syntax for consistency and maintains backward compatibility with historical data.
Key Changes:
- Added three new ALTER TABLE operations for partition field management (ADD/DROP/REPLACE)
- Extended grammar with partition transform expressions supporting bucket, truncate, year, month, day, hour, and identity transforms
- Implemented Iceberg-specific metadata operations through IcebergMetadataOps
- Added comprehensive test coverage including DDL tests, query/write tests, and MTMV compatibility tests
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| DorisParser.g4 | Added grammar rules for ADD/DROP/REPLACE PARTITION KEY with partition transform expressions |
| LogicalPlanBuilder.java | Implemented parser logic to extract partition field information and create operation objects |
| Add/Drop/ReplacePartitionFieldOp.java | Nereids operation classes for partition evolution with validation and SQL generation |
| Add/Drop/ReplacePartitionFieldClause.java | Analysis clause classes for legacy planner compatibility |
| AlterTableCommand.java | Extended to support partition field operations for external tables |
| Alter.java | Added validation to reject partition field operations on internal tables and handle external table operations |
| IcebergMetadataOps.java | Core implementation using Iceberg's UpdatePartitionSpec API with transform support |
| IcebergExternalCatalog.java | Added public methods to invoke partition evolution operations |
| IcebergExternalTable.java | Removed caching mechanism from isValidRelatedTable() to handle partition evolution |
| AlterOpType.java | Added three new operation types for partition evolution |
| test_iceberg_partition_evolution_*.groovy | Comprehensive test suites covering DDL, query/write, and MTMV scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/doris/nereids/trees/plans/commands/info/ReplacePartitionFieldOp.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/analysis/AddPartitionFieldClause.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/IcebergDlaTable.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergMetadataOps.java
Outdated
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 34838 ms |
TPC-DS: Total hot run time: 187181 ms |
ClickBench: Total hot run time: 27.48 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…apache#57972) This PR implements support for partition evolution in Iceberg tables, allowing users to dynamically modify table partition strategies without rewriting data files. This is a metadata-only operation that maintains multiple partition spec versions. Apache Iceberg supports partition evolution, which enables changing partition strategies on existing tables without data migration. Doris, as a query engine for Iceberg, needs to support SQL syntax for partition evolution operations to provide users with flexible partition management. - **Add Partition Field**: Add new partition field to existing partition specifications - **Drop Partition Field**: Remove partition field from existing partition specifications - **Replace Partition Field**:Replace partition field from existing partition specifications with new partition field 1. **Metadata-only operation**: Partition evolution only updates table metadata, no data files are rewritten 2. **Backward compatibility**: Historical data retains original partition specs, new data uses new partition specs 3. **Syntax compatibility**: Follows Spark SQL ALTER TABLE syntax for consistency ```sql -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE table_name ADD PARTITION KEY partition_transform [AS key_name]; -- example ALTER TABLE prod.db.sample ADD PARTITION KEY bucket(16, id); ALTER TABLE prod.db.sample ADD PARTITION KEY truncate(4, data); ALTER TABLE prod.db.sample ADD PARTITION KEY year(ts); -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE prod.db.sample ADD PARTITION KEY bucket(16, id) AS shard; ``` ```sql ALTER TABLE table_name DROP PARTITION KEY partition_transform|key_name; -- example ALTER TABLE prod.db.sample DROP PARTITION KEY catalog; ALTER TABLE prod.db.sample DROP PARTITION KEY bucket(16, id); ALTER TABLE prod.db.sample DROP PARTITION KEY truncate(4, data); ALTER TABLE prod.db.sample DROP PARTITION KEY year(ts); ALTER TABLE prod.db.sample DROP PARTITION KEY shard; ``` ```sql -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE table_name REPLACE PARTITION KEY key_name WITH partition_transform [AS key_name]; -- example ALTER TABLE prod.db.sample REPLACE PARTITION KEY ts_day WITH day(ts); -- use optional AS keyword to specify a custom name for the new partition field ALTER TABLE prod.db.sample REPLACE PARTITION KEY ts_day WITH day(ts) AS day_of_ts; ``` | Transform | Syntax | Example | |-----------|--------|---------| | bucket | `bucket(N, column)` | `bucket(16, id)` | | truncate | `truncate(N, column)` | `truncate(10, name)` | | year | `year(column)` | `year(ts)` | | month | `month(column)` | `month(ts)` | | day | `day(column)` | `day(ts)` | | hour | `hour(column)` | `hour(ts)` | | identity | `column` | `category` |
…apache#57972) ### What problem does this PR solve? ## Summary This PR implements support for partition evolution in Iceberg tables, allowing users to dynamically modify table partition strategies without rewriting data files. This is a metadata-only operation that maintains multiple partition spec versions. ## Background Apache Iceberg supports partition evolution, which enables changing partition strategies on existing tables without data migration. Doris, as a query engine for Iceberg, needs to support SQL syntax for partition evolution operations to provide users with flexible partition management. ## Features ### Core Functionality - **Add Partition Field**: Add new partition field to existing partition specifications - **Drop Partition Field**: Remove partition field from existing partition specifications - **Replace Partition Field**:Replace partition field from existing partition specifications with new partition field ### Design Principles 1. **Metadata-only operation**: Partition evolution only updates table metadata, no data files are rewritten 2. **Backward compatibility**: Historical data retains original partition specs, new data uses new partition specs 3. **Syntax compatibility**: Follows Spark SQL ALTER TABLE syntax for consistency ## Syntax ### Add Partition Field ```sql -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE table_name ADD PARTITION KEY partition_transform [AS key_name]; -- example ALTER TABLE prod.db.sample ADD PARTITION KEY bucket(16, id); ALTER TABLE prod.db.sample ADD PARTITION KEY truncate(4, data); ALTER TABLE prod.db.sample ADD PARTITION KEY year(ts); -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE prod.db.sample ADD PARTITION KEY bucket(16, id) AS shard; ``` ### Drop Partition Field ```sql ALTER TABLE table_name DROP PARTITION KEY partition_transform|key_name; -- example ALTER TABLE prod.db.sample DROP PARTITION KEY catalog; ALTER TABLE prod.db.sample DROP PARTITION KEY bucket(16, id); ALTER TABLE prod.db.sample DROP PARTITION KEY truncate(4, data); ALTER TABLE prod.db.sample DROP PARTITION KEY year(ts); ALTER TABLE prod.db.sample DROP PARTITION KEY shard; ``` ### Replace Partition Field ```sql -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE table_name REPLACE PARTITION KEY key_name WITH partition_transform [AS key_name]; -- example ALTER TABLE prod.db.sample REPLACE PARTITION KEY ts_day WITH day(ts); -- use optional AS keyword to specify a custom name for the new partition field ALTER TABLE prod.db.sample REPLACE PARTITION KEY ts_day WITH day(ts) AS day_of_ts; ``` ### Supported Partition Transforms | Transform | Syntax | Example | |-----------|--------|---------| | bucket | `bucket(N, column)` | `bucket(16, id)` | | truncate | `truncate(N, column)` | `truncate(10, name)` | | year | `year(column)` | `year(ts)` | | month | `month(column)` | `month(ts)` | | day | `day(column)` | `day(ts)` | | hour | `hour(column)` | `hour(ts)` | | identity | `column` | `category` |
What problem does this PR solve?
Summary
This PR implements support for partition evolution in Iceberg tables, allowing users to dynamically modify table partition strategies without rewriting data files. This is a metadata-only operation that maintains multiple partition spec versions.
Background
Apache Iceberg supports partition evolution, which enables changing partition strategies on existing tables without data migration. Doris, as a query engine for Iceberg, needs to support SQL syntax for partition evolution operations to provide users with flexible partition management.
Features
Core Functionality
Design Principles
Syntax
Add Partition Field
Drop Partition Field
Replace Partition Field
Supported Partition Transforms
bucket(N, column)bucket(16, id)truncate(N, column)truncate(10, name)year(column)year(ts)month(column)month(ts)day(column)day(ts)hour(column)hour(ts)columncategoryRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)