[feat](iceberg) Add OPTIMIZE TABLE syntax and framework with Iceberg action implementations#55679
Merged
morningman merged 24 commits intoapache:masterfrom Sep 12, 2025
Merged
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 34157 ms |
TPC-DS: Total hot run time: 186857 ms |
ClickBench: Total hot run time: 29.88 s |
Contributor
FE Regression Coverage ReportIncrement line coverage |
morningman
reviewed
Sep 8, 2025
Contributor
morningman
left a comment
There was a problem hiding this comment.
Please add ut to test the validation logic of each action
.../src/main/java/org/apache/doris/datasource/iceberg/action/IcebergRewriteDataFilesAction.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/doris/datasource/iceberg/action/IcebergRewriteDataFilesAction.java
Outdated
Show resolved
Hide resolved
d751ed6 to
036afe4
Compare
Contributor
Author
|
run buildall |
8434379 to
87c3092
Compare
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 34447 ms |
TPC-DS: Total hot run time: 188461 ms |
ClickBench: Total hot run time: 30.28 s |
Contributor
FE UT Coverage ReportIncrement line coverage |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 35098 ms |
TPC-DS: Total hot run time: 189607 ms |
ClickBench: Total hot run time: 29.89 s |
Contributor
FE Regression Coverage ReportIncrement line coverage |
morningman
approved these changes
Sep 10, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
morrySnow
approved these changes
Sep 11, 2025
16 tasks
16 tasks
morningman
pushed a commit
that referenced
this pull request
Oct 9, 2025
### What problem does this PR solve? This PR extends the OPTIMIZE TABLE framework introduced in #55679 by implementing additional Iceberg meta procedure actions. Building upon the foundation established for Iceberg table optimization, this enhancement adds critical snapshot management operations that enable more sophisticated Iceberg table maintenance workflows. #### New Iceberg Actions Implemented This PR introduces **5 new Iceberg meta procedure actions**: 1. **`cherrypick_snapshot`** - Cherry-picks changes from a specific snapshot 2. **`fast_forward`** - Fast-forwards one branch to match another branch's latest snapshot 3. **`rollback_to_snapshot`** - Rolls back table to a specific snapshot 4. **`rollback_to_timestamp`** - Rolls back table to a specific timestamp 5. **`set_current_snapshot`** - Sets a specific snapshot as current #### Example Usage ```sql -- Cherry-pick changes from a snapshot OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" = "123456789"); ``` ``` -- Fast-forward branch to match another branch OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" = "main"); ``` ``` -- Rollback to specific snapshot OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" = "987654321"); ``` The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating dependencies on external tools like Spark SQL for test data preparation.
github-actions bot
pushed a commit
that referenced
this pull request
Oct 9, 2025
### What problem does this PR solve? This PR extends the OPTIMIZE TABLE framework introduced in #55679 by implementing additional Iceberg meta procedure actions. Building upon the foundation established for Iceberg table optimization, this enhancement adds critical snapshot management operations that enable more sophisticated Iceberg table maintenance workflows. #### New Iceberg Actions Implemented This PR introduces **5 new Iceberg meta procedure actions**: 1. **`cherrypick_snapshot`** - Cherry-picks changes from a specific snapshot 2. **`fast_forward`** - Fast-forwards one branch to match another branch's latest snapshot 3. **`rollback_to_snapshot`** - Rolls back table to a specific snapshot 4. **`rollback_to_timestamp`** - Rolls back table to a specific timestamp 5. **`set_current_snapshot`** - Sets a specific snapshot as current #### Example Usage ```sql -- Cherry-pick changes from a snapshot OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" = "123456789"); ``` ``` -- Fast-forward branch to match another branch OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" = "main"); ``` ``` -- Rollback to specific snapshot OPTIMIZE TABLE iceberg_catalog.db.table PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" = "987654321"); ``` The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating dependencies on external tools like Spark SQL for test data preparation.
morningman
pushed a commit
that referenced
this pull request
Oct 10, 2025
…56638) ### What problem does this PR solve? Issue: #56002 Related: #55679 This PR transforms the existing OPTIMIZE TABLE syntax to the more standard ALTER TABLE EXECUTE action syntax. This change provides a unified interface for table action operations across different table engines in Apache Doris. #### New ALTER TABLE EXECUTE Syntax ```sql ALTER TABLE [catalog.]database.table EXECUTE action("key1" = "value1", "key2" = "value2", ...) [PARTITION (partition_list)] [WHERE condition] ```
16 tasks
8 tasks
morningman
pushed a commit
to apache/doris-website
that referenced
this pull request
Nov 10, 2025
Issue Number: apache/doris#56002 Related PR: apache/doris#55679 apache/doris#56638 apache/doris#56413
morningman
pushed a commit
that referenced
this pull request
Nov 10, 2025
…le optimization and compaction (#56413) ### What problem does this PR solve? **Issue Number:** #56002 **Related PR:** #55679 #56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- ## Feature Description This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ## Usage Example ### Basic Usage ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ### Custom Parameters ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ### Rewrite with WHERE Conditions ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ### Rewrite All Files ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ### Handle Delete Files ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- ## Parameter List ### File Size Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | ### Input Files Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | ### Delete Files Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | ### Output Specification Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | ### Parameter Notes - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ## Execution Flow ### Overall Process ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` ### Detailed Steps #### Step 1: Parameter Validation and Table Retrieval - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters #### Step 2: File Planning and Grouping (RewriteDataFilePlanner) - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` #### Step 3: Concurrent Rewrite Execution (RewriteDataFileExecutor) - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion #### Step 4: Transaction Commit and Result Return - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 10, 2025
…le optimization and compaction (apache#56413) **Issue Number:** apache#56002 **Related PR:** apache#55679 apache#56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 11, 2025
…pache#56638) Issue: apache#56002 Related: apache#55679 This PR transforms the existing OPTIMIZE TABLE syntax to the more standard ALTER TABLE EXECUTE action syntax. This change provides a unified interface for table action operations across different table engines in Apache Doris. ```sql ALTER TABLE [catalog.]database.table EXECUTE action("key1" = "value1", "key2" = "value2", ...) [PARTITION (partition_list)] [WHERE condition] ```
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 11, 2025
…le optimization and compaction (apache#56413) **Issue Number:** apache#56002 **Related PR:** apache#55679 apache#56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 12, 2025
…pache#56638) Issue: apache#56002 Related: apache#55679 This PR transforms the existing OPTIMIZE TABLE syntax to the more standard ALTER TABLE EXECUTE action syntax. This change provides a unified interface for table action operations across different table engines in Apache Doris. ```sql ALTER TABLE [catalog.]database.table EXECUTE action("key1" = "value1", "key2" = "value2", ...) [PARTITION (partition_list)] [WHERE condition] ```
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 12, 2025
…le optimization and compaction (apache#56413) **Issue Number:** apache#56002 **Related PR:** apache#55679 apache#56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 13, 2025
…pache#56638) Issue: apache#56002 Related: apache#55679 This PR transforms the existing OPTIMIZE TABLE syntax to the more standard ALTER TABLE EXECUTE action syntax. This change provides a unified interface for table action operations across different table engines in Apache Doris. ```sql ALTER TABLE [catalog.]database.table EXECUTE action("key1" = "value1", "key2" = "value2", ...) [PARTITION (partition_list)] [WHERE condition] ```
suxiaogang223
added a commit
to suxiaogang223/doris
that referenced
this pull request
Nov 13, 2025
…le optimization and compaction (apache#56413) **Issue Number:** apache#56002 **Related PR:** apache#55679 apache#56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
wyxxxcat
pushed a commit
to wyxxxcat/doris
that referenced
this pull request
Nov 18, 2025
…le optimization and compaction (apache#56413) ### What problem does this PR solve? **Issue Number:** apache#56002 **Related PR:** apache#55679 apache#56638 This PR implements the `rewrite_data_files` action for Apache Iceberg tables in Doris, providing comprehensive table optimization and data file compaction capabilities. This feature allows users to reorganize data files to improve query performance, optimize storage efficiency, and maintain delete files according to Iceberg's official specification. --- ## Feature Description This PR implements the `rewrite_data_files` operation for Iceberg tables, providing table optimization and data file compaction capabilities. The feature follows Iceberg's official `RewriteDataFiles` specification and provides the following core capabilities: 1. **Data File Compaction**: Merges multiple small files into larger files, reducing file count and improving query performance 2. **Storage Efficiency Optimization**: Reduces storage overhead through file reorganization and optimizes data distribution 3. **Delete File Management**: Properly handles and maintains delete files, reducing filtering overhead during queries 4. **WHERE Condition Support**: Supports rewriting specific data ranges through WHERE conditions, including various data types (BIGINT, STRING, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional expressions 5. **Concurrent Execution**: Supports concurrent execution of multiple rewrite tasks for improved processing efficiency After execution, detailed statistics are returned, including: - `rewritten_data_files_count`: Number of data files that were rewritten - `added_data_files_count`: Number of new data files generated - `rewritten_bytes_count`: Number of bytes rewritten - `removed_delete_files_count`: Number of delete files removed --- ## Usage Example ### Basic Usage ```sql -- Rewrite data files with default parameters ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files(); ``` ### Custom Parameters ```sql -- Specify target file size and minimum input files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3" ); ``` ### Rewrite with WHERE Conditions ```sql -- Rewrite only data within specific date range ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "104857600", "min-input-files" = "3", "delete-ratio-threshold" = "0.2" ) WHERE created_date >= '2024-01-01' AND status = 'active'; -- Rewrite data satisfying complex conditions ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "target-file-size-bytes" = "536870912" ) WHERE age > 25 AND salary > 50000.0 AND is_active = true; ``` ### Rewrite All Files ```sql -- Ignore file size limits and rewrite all files ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files("rewrite-all" = "true"); ``` ### Handle Delete Files ```sql -- Trigger rewrite when delete file count or ratio exceeds threshold ALTER TABLE iceberg_catalog.db.table EXECUTE rewrite_data_files( "delete-file-threshold" = "10", "delete-ratio-threshold" = "0.3" ); ``` --- ## Parameter List ### File Size Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in bytes for output files | | `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) | Minimum file size in bytes for files to be rewritten | | `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) | Maximum file size in bytes for files to be rewritten | ### Input Files Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `min-input-files` | Int | 5 | Minimum number of input files to rewrite together | | `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of size | | `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in bytes for a file group to be rewritten | ### Delete Files Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of delete files to trigger rewrite | | `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to total records to trigger rewrite (0.0-1.0) | ### Output Specification Parameters | Parameter Name | Type | Default Value | Description | |----------------|------|---------------|-------------| | `output-spec-id` | Long | 2 | Partition specification ID for output files | ### Parameter Notes - If `min-file-size-bytes` is not specified, default value is `target-file-size-bytes * 0.75` - If `max-file-size-bytes` is not specified, default value is `target-file-size-bytes * 1.8` - File groups are only rewritten when they meet the `min-input-files` condition - `delete-file-threshold` and `delete-ratio-threshold` are used to determine if rewrite is needed to handle delete files --- ## Execution Flow ### Overall Process ``` 1. Parameter Validation and Table Retrieval ├─ Validate rewrite parameters ├─ Get Iceberg table reference └─ Check if table has data snapshots 2. File Planning and Grouping ├─ Use RewriteDataFilePlanner to plan file scan tasks ├─ Filter file scan tasks based on WHERE conditions ├─ Organize file groups by partition and size constraints └─ Filter file groups that don't meet rewrite conditions 3. Concurrent Rewrite Execution ├─ Create RewriteDataFileExecutor ├─ Execute multiple file group rewrite tasks concurrently ├─ Each task executes INSERT-SELECT statements └─ Wait for all tasks to complete 4. Transaction Commit and Result Return ├─ Commit transaction and create new snapshot ├─ Update table metadata └─ Return detailed execution result statistics ``` ### Detailed Steps #### Step 1: Parameter Validation and Table Retrieval - Validate all parameters for validity and value ranges - If table has no snapshots, return empty result directly - Calculate default values for `min-file-size-bytes` and `max-file-size-bytes` based on parameters #### Step 2: File Planning and Grouping (RewriteDataFilePlanner) - **File Scanning**: Build `TableScan` based on WHERE conditions to get qualified `FileScanTask` - **File Filtering**: Filter files based on `min-file-size-bytes`, `max-file-size-bytes`, and `rewrite-all` parameters - **Partition Grouping**: Group files into `RewriteDataGroup` by partition specification - **Size Constraints**: Ensure each file group doesn't exceed `max-file-group-size-bytes` - **Delete File Check**: Determine if rewrite is needed based on `delete-file-threshold` and `delete-ratio-threshold` #### Step 3: Concurrent Rewrite Execution (RewriteDataFileExecutor) - **Task Creation**: Create `RewriteGroupTask` for each `RewriteDataGroup` - **Concurrent Execution**: Use thread pool to execute multiple rewrite tasks concurrently - **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...` statements to write data to new files - **Progress Tracking**: Use atomic counters and `CountDownLatch` to track task completion #### Step 4: Transaction Commit and Result Return - **Transaction Management**: Use `IcebergTransaction` to manage transactions, ensuring atomicity - **Metadata Update**: Commit transaction to create new snapshot and update table metadata - **Result Statistics**: Aggregate execution results from all tasks and return statistics
11 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
related: #56002
This PR introduces the OPTIMIZE TABLE syntax and framework to Apache Doris, with initial implementations for Iceberg table optimization actions. This feature provides a unified interface for table optimization operations across different table engines.
New OPTIMIZE TABLE Syntax
This new syntax provides a unified interface for table optimization operations across different table engines in Doris.
Framework Architecture
1. Core Interface Design
OptimizeAction: Generic interface for all optimization actionsvalidate(),execute(),isSupported(),getDescription()2. Factory Pattern Implementation
OptimizeActionFactory: Main factory that routes requests to engine-specific factoriesIcebergOptimizeActionFactory: Iceberg-specific action factory3. Command Processing
OptimizeTableCommand: Handles parsing, validation, and execution coordinationBaseIcebergAction: Abstract base class providing common functionality for Iceberg actionsIceberg Action Implementations
This PR implements 8 Iceberg optimization procedures that inherit from
BaseIcebergAction:rollback_to_snapshotIcebergRollbackToSnapshotActionrollback_to_timestampIcebergRollbackToTimestampActionset_current_snapshotIcebergSetCurrentSnapshotActioncherrypick_snapshotIcebergCherrypickSnapshotActionfast_forwardIcebergFastForwardActionexpire_snapshotsIcebergExpireSnapshotsActionrewrite_data_filesIcebergRewriteDataFilesActionCheck List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)