Skip to content

[feat](iceberg) Add OPTIMIZE TABLE syntax and framework with Iceberg action implementations#55679

Merged
morningman merged 24 commits intoapache:masterfrom
suxiaogang223:support_optimize_gram
Sep 12, 2025
Merged

[feat](iceberg) Add OPTIMIZE TABLE syntax and framework with Iceberg action implementations#55679
morningman merged 24 commits intoapache:masterfrom
suxiaogang223:support_optimize_gram

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Sep 4, 2025

Summary

related: #56002

This PR introduces the OPTIMIZE TABLE syntax and framework to Apache Doris, with initial implementations for Iceberg table optimization actions. This feature provides a unified interface for table optimization operations across different table engines.

New OPTIMIZE TABLE Syntax

OPTIMIZE TABLE [catalog.]database.table 
    [PARTITION(partition1, partition2, ...)] 
    [WHERE condition] 
    PROPERTIES("action" = "action_name", "key1" = "value1", ...)

This new syntax provides a unified interface for table optimization operations across different table engines in Doris.

Framework Architecture

1. Core Interface Design

  • OptimizeAction: Generic interface for all optimization actions
    • Provides methods: validate(), execute(), isSupported(), getDescription()
    • Engine-agnostic design enables support for different table types

2. Factory Pattern Implementation

  • OptimizeActionFactory: Main factory that routes requests to engine-specific factories
  • IcebergOptimizeActionFactory: Iceberg-specific action factory
  • Clean separation between framework and engine-specific implementations

3. Command Processing

  • OptimizeTableCommand: Handles parsing, validation, and execution coordination
  • BaseIcebergAction: Abstract base class providing common functionality for Iceberg actions
  • Comprehensive permission checking and parameter validation

Iceberg Action Implementations

This PR implements 8 Iceberg optimization procedures that inherit from BaseIcebergAction:

Action Type Class Description
rollback_to_snapshot IcebergRollbackToSnapshotAction Rollback table to specific snapshot ID
rollback_to_timestamp IcebergRollbackToTimestampAction Rollback table to specific timestamp
set_current_snapshot IcebergSetCurrentSnapshotAction Set current snapshot to specific ID
cherrypick_snapshot IcebergCherrypickSnapshotAction Cherry-pick changes from snapshot
fast_forward IcebergFastForwardAction Fast-forward to target branch/snapshot
expire_snapshots IcebergExpireSnapshotsAction Remove old snapshots to free storage
rewrite_data_files IcebergRewriteDataFilesAction Optimize data file sizes and layout

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@suxiaogang223 suxiaogang223 marked this pull request as draft September 4, 2025 09:39
@Thearas
Copy link
Contributor

Thearas commented Sep 4, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223 suxiaogang223 changed the title [feat](sql) support OPTIMIZE table statement [feat](iceberg) Add OPTIMIZE TABLE syntax and framework with Iceberg action implementations Sep 5, 2025
@suxiaogang223 suxiaogang223 marked this pull request as ready for review September 5, 2025 09:12
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34157 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 541032b83267641a0214e7103d1da4fe2723593d, data reload: false

------ Round 1 ----------------------------------
q1	17629	5207	5058	5058
q2	2009	322	208	208
q3	10264	1295	702	702
q4	10224	1023	546	546
q5	7498	2511	2320	2320
q6	188	176	140	140
q7	944	772	627	627
q8	9362	1339	1186	1186
q9	7099	5123	5238	5123
q10	6994	2400	1997	1997
q11	484	315	277	277
q12	381	360	231	231
q13	17815	3661	3030	3030
q14	238	231	219	219
q15	579	541	495	495
q16	438	426	386	386
q17	606	866	369	369
q18	7583	7065	7169	7065
q19	1094	973	581	581
q20	341	343	244	244
q21	3841	2553	2360	2360
q22	1085	1050	993	993
Total cold run time: 106696 ms
Total hot run time: 34157 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5617	5095	5061	5061
q2	248	333	228	228
q3	2158	2678	2342	2342
q4	1347	1782	1358	1358
q5	4206	4315	4616	4315
q6	217	180	135	135
q7	2131	1935	1891	1891
q8	2697	2644	2620	2620
q9	7440	7418	7363	7363
q10	3167	3488	2896	2896
q11	589	532	498	498
q12	777	964	644	644
q13	3523	3899	3327	3327
q14	305	301	277	277
q15	532	479	495	479
q16	451	490	469	469
q17	1188	1580	1395	1395
q18	7696	7780	7695	7695
q19	856	868	1019	868
q20	1970	1938	1805	1805
q21	4886	4290	4324	4290
q22	1113	1031	1027	1027
Total cold run time: 53114 ms
Total hot run time: 50983 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186857 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 541032b83267641a0214e7103d1da4fe2723593d, data reload: false

query1	1073	435	425	425
query2	6559	1693	1688	1688
query3	6751	229	218	218
query4	26167	24017	23341	23341
query5	4417	650	523	523
query6	348	238	246	238
query7	4681	514	302	302
query8	320	268	271	268
query9	8658	2945	2918	2918
query10	495	354	318	318
query11	15828	14988	14773	14773
query12	178	124	125	124
query13	1686	569	452	452
query14	9365	5911	5793	5793
query15	213	191	179	179
query16	7445	666	466	466
query17	1228	751	631	631
query18	2026	493	335	335
query19	206	202	169	169
query20	138	125	120	120
query21	215	127	118	118
query22	4399	4415	4243	4243
query23	33761	33115	32985	32985
query24	8261	2359	2377	2359
query25	583	530	445	445
query26	1242	278	168	168
query27	2741	518	356	356
query28	4392	2243	2212	2212
query29	787	607	487	487
query30	297	225	203	203
query31	914	796	746	746
query32	97	83	79	79
query33	570	423	355	355
query34	804	874	530	530
query35	859	857	736	736
query36	979	1017	919	919
query37	127	112	87	87
query38	4063	4039	3955	3955
query39	1495	1440	1445	1440
query40	223	132	126	126
query41	63	61	60	60
query42	129	116	122	116
query43	504	475	450	450
query44	1366	868	853	853
query45	183	182	168	168
query46	859	1027	656	656
query47	1810	1846	1768	1768
query48	401	434	326	326
query49	744	514	406	406
query50	660	689	405	405
query51	4181	4190	4107	4107
query52	120	121	111	111
query53	251	272	209	209
query54	615	589	529	529
query55	102	90	88	88
query56	334	341	315	315
query57	1194	1227	1142	1142
query58	289	279	282	279
query59	2628	2702	2678	2678
query60	369	357	343	343
query61	166	163	161	161
query62	817	743	680	680
query63	235	197	196	196
query64	4405	1275	953	953
query65	4313	4203	4296	4203
query66	1169	439	339	339
query67	15645	15448	14974	14974
query68	8471	921	581	581
query69	513	332	291	291
query70	1249	1113	1152	1113
query71	564	363	329	329
query72	6089	4991	5079	4991
query73	766	652	356	356
query74	9256	9081	8946	8946
query75	3832	3117	2591	2591
query76	3611	1193	760	760
query77	814	404	328	328
query78	9631	9839	8924	8924
query79	2457	840	597	597
query80	634	569	536	536
query81	482	256	224	224
query82	438	139	113	113
query83	301	273	242	242
query84	372	120	99	99
query85	885	472	421	421
query86	367	313	304	304
query87	4318	4310	4114	4114
query88	3344	2239	2252	2239
query89	479	320	285	285
query90	1912	230	223	223
query91	174	172	135	135
query92	91	75	75	75
query93	1775	1010	636	636
query94	690	421	323	323
query95	407	344	325	325
query96	489	576	281	281
query97	2647	2684	2603	2603
query98	251	216	232	216
query99	1466	1433	1311	1311
Total cold run time: 276918 ms
Total hot run time: 186857 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 541032b83267641a0214e7103d1da4fe2723593d, data reload: false

query1	0.06	0.05	0.06
query2	0.09	0.06	0.06
query3	0.25	0.09	0.09
query4	1.61	0.11	0.11
query5	0.46	0.44	0.40
query6	1.17	0.65	0.68
query7	0.04	0.03	0.03
query8	0.06	0.05	0.05
query9	0.61	0.54	0.52
query10	0.59	0.59	0.58
query11	0.17	0.13	0.11
query12	0.16	0.12	0.12
query13	0.62	0.62	0.61
query14	0.79	0.83	0.84
query15	0.90	0.85	0.87
query16	0.39	0.40	0.42
query17	1.02	1.03	1.08
query18	0.22	0.20	0.20
query19	1.93	1.91	1.91
query20	0.01	0.01	0.01
query21	15.39	0.92	0.59
query22	0.76	1.19	0.72
query23	14.87	1.38	0.62
query24	6.63	1.62	0.69
query25	0.57	0.22	0.13
query26	0.53	0.17	0.13
query27	0.05	0.06	0.06
query28	9.60	0.91	0.42
query29	12.62	3.92	3.23
query30	0.28	0.13	0.11
query31	2.82	0.59	0.39
query32	3.23	0.55	0.48
query33	3.04	3.10	3.08
query34	16.14	5.47	4.87
query35	4.92	4.97	4.88
query36	0.68	0.51	0.50
query37	0.11	0.07	0.08
query38	0.06	0.04	0.05
query39	0.03	0.03	0.03
query40	0.19	0.15	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 103.84 s
Total hot run time: 29.88 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 51.61% (128/248) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ut to test the validation logic of each action

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34447 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 87c3092076bc5770825831252cd96a89804bddd7, data reload: false

------ Round 1 ----------------------------------
q1	17804	5175	5085	5085
q2	1962	315	210	210
q3	10276	1266	708	708
q4	10233	1020	532	532
q5	7500	2449	2304	2304
q6	191	170	134	134
q7	907	754	624	624
q8	9360	1326	1116	1116
q9	6878	5234	5116	5116
q10	6928	2400	1934	1934
q11	483	296	272	272
q12	342	352	218	218
q13	17786	3664	3004	3004
q14	245	235	206	206
q15	581	507	493	493
q16	993	996	945	945
q17	602	859	356	356
q18	7384	7208	7064	7064
q19	1419	969	555	555
q20	342	333	243	243
q21	3833	2527	2355	2355
q22	1093	1019	973	973
Total cold run time: 107142 ms
Total hot run time: 34447 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5144	5073	5092	5073
q2	247	327	219	219
q3	2173	2648	2328	2328
q4	1327	1749	1343	1343
q5	4232	4453	4571	4453
q6	224	175	133	133
q7	2040	1927	1844	1844
q8	2740	2466	2629	2466
q9	7409	7349	7394	7349
q10	3127	3276	2874	2874
q11	577	507	505	505
q12	676	777	656	656
q13	3528	3951	3511	3511
q14	285	296	269	269
q15	533	483	471	471
q16	1077	1071	1047	1047
q17	1192	1531	1427	1427
q18	7882	7635	7643	7635
q19	769	817	884	817
q20	1874	1976	1804	1804
q21	4737	4358	4194	4194
q22	1097	1028	1002	1002
Total cold run time: 52890 ms
Total hot run time: 51420 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188461 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 87c3092076bc5770825831252cd96a89804bddd7, data reload: false

query1	1071	429	403	403
query2	6557	1689	1753	1689
query3	6760	222	229	222
query4	26666	23596	22780	22780
query5	4432	683	550	550
query6	352	268	236	236
query7	4655	525	299	299
query8	329	269	259	259
query9	8666	2900	2927	2900
query10	484	346	300	300
query11	15958	15059	14769	14769
query12	178	123	114	114
query13	1683	541	427	427
query14	11225	9203	9110	9110
query15	245	197	170	170
query16	7677	640	494	494
query17	1205	745	674	674
query18	2045	432	335	335
query19	205	203	167	167
query20	133	141	128	128
query21	217	130	115	115
query22	4080	4137	4063	4063
query23	33657	33019	33083	33019
query24	8161	2340	2435	2340
query25	581	527	475	475
query26	1241	303	171	171
query27	2714	515	359	359
query28	4405	2278	2240	2240
query29	811	632	523	523
query30	297	223	208	208
query31	917	832	715	715
query32	92	82	83	82
query33	589	396	384	384
query34	791	848	516	516
query35	840	823	750	750
query36	962	1027	920	920
query37	136	115	96	96
query38	3512	3457	3467	3457
query39	1486	1453	1412	1412
query40	228	135	128	128
query41	66	61	61	61
query42	128	118	121	118
query43	541	529	462	462
query44	1353	864	858	858
query45	180	177	169	169
query46	841	1012	649	649
query47	1797	1803	1742	1742
query48	395	431	327	327
query49	750	511	412	412
query50	659	684	397	397
query51	4041	3913	3855	3855
query52	119	115	111	111
query53	239	286	194	194
query54	609	604	542	542
query55	93	92	88	88
query56	352	336	319	319
query57	1186	1217	1142	1142
query58	299	278	278	278
query59	2572	2650	2508	2508
query60	358	344	358	344
query61	194	187	196	187
query62	835	738	676	676
query63	236	195	194	194
query64	4387	1144	826	826
query65	4034	3977	3954	3954
query66	1080	446	367	367
query67	15238	15444	15114	15114
query68	8087	918	590	590
query69	489	332	310	310
query70	1457	1388	1277	1277
query71	570	360	323	323
query72	5805	5015	5061	5015
query73	720	629	358	358
query74	9166	9226	8964	8964
query75	3964	3213	2746	2746
query76	3719	1215	738	738
query77	806	401	329	329
query78	9436	9739	8903	8903
query79	2184	819	588	588
query80	654	580	519	519
query81	464	256	234	234
query82	439	163	133	133
query83	279	256	246	246
query84	262	106	94	94
query85	944	459	411	411
query86	357	329	299	299
query87	3721	3763	3645	3645
query88	3318	2226	2213	2213
query89	398	315	293	293
query90	1957	230	223	223
query91	162	175	134	134
query92	84	80	69	69
query93	1406	983	636	636
query94	686	415	326	326
query95	411	331	330	330
query96	482	592	282	282
query97	2893	2949	2823	2823
query98	247	232	219	219
query99	1661	1396	1298	1298
Total cold run time: 276053 ms
Total hot run time: 188461 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 87c3092076bc5770825831252cd96a89804bddd7, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.04	0.06
query3	0.25	0.08	0.08
query4	1.62	0.12	0.12
query5	0.28	0.27	0.25
query6	1.20	0.66	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.63	0.53	0.52
query10	0.59	0.58	0.57
query11	0.17	0.11	0.11
query12	0.15	0.12	0.12
query13	0.61	0.62	0.62
query14	1.02	1.04	1.02
query15	0.86	0.85	0.84
query16	0.42	0.41	0.40
query17	1.03	1.03	1.07
query18	0.21	0.20	0.20
query19	1.89	1.81	1.77
query20	0.02	0.02	0.01
query21	15.40	0.95	0.58
query22	0.77	1.15	0.71
query23	14.93	1.40	0.66
query24	6.56	1.71	1.16
query25	0.50	0.16	0.16
query26	0.65	0.17	0.14
query27	0.06	0.05	0.05
query28	9.78	0.91	0.42
query29	12.54	3.94	3.25
query30	0.29	0.13	0.12
query31	2.84	0.59	0.38
query32	3.26	0.56	0.47
query33	3.05	3.10	3.12
query34	16.29	5.46	4.87
query35	4.92	4.90	4.84
query36	0.71	0.51	0.51
query37	0.10	0.07	0.08
query38	0.06	0.05	0.05
query39	0.04	0.03	0.03
query40	0.18	0.17	0.14
query41	0.09	0.03	0.04
query42	0.04	0.03	0.02
query43	0.05	0.04	0.04
Total cold run time: 104.29 s
Total hot run time: 30.28 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 36.89% (152/412) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35098 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e01349481886b9377bc8d2b896c78c9ded78b944, data reload: false

------ Round 1 ----------------------------------
q1	17604	5252	5026	5026
q2	2019	332	216	216
q3	10260	1278	724	724
q4	10247	1024	526	526
q5	7564	2426	2337	2337
q6	185	174	146	146
q7	931	773	636	636
q8	9347	1329	1094	1094
q9	7053	5078	5142	5078
q10	6891	2388	2002	2002
q11	475	295	280	280
q12	350	361	228	228
q13	17791	3656	3008	3008
q14	254	251	219	219
q15	575	492	492	492
q16	1001	1014	935	935
q17	598	850	364	364
q18	7758	7193	7051	7051
q19	1235	950	533	533
q20	340	339	234	234
q21	3905	3246	2956	2956
q22	1059	1042	1013	1013
Total cold run time: 107442 ms
Total hot run time: 35098 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5105	5069	5107	5069
q2	246	329	231	231
q3	2180	2708	2334	2334
q4	1355	1763	1351	1351
q5	4252	4683	4620	4620
q6	216	174	135	135
q7	2014	2021	1812	1812
q8	2592	2599	2662	2599
q9	7435	7403	7356	7356
q10	3064	3331	2853	2853
q11	574	508	513	508
q12	747	778	648	648
q13	3592	4028	3297	3297
q14	297	320	280	280
q15	541	496	478	478
q16	1120	1138	1046	1046
q17	1180	1584	1382	1382
q18	7876	7830	7768	7768
q19	788	836	808	808
q20	2009	2060	2075	2060
q21	4995	4309	4323	4309
q22	1088	1045	1019	1019
Total cold run time: 53266 ms
Total hot run time: 51963 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189607 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e01349481886b9377bc8d2b896c78c9ded78b944, data reload: false

query1	1082	435	401	401
query2	6557	1733	1714	1714
query3	6755	231	222	222
query4	26322	23620	23435	23435
query5	4873	653	516	516
query6	358	247	262	247
query7	4674	542	311	311
query8	330	266	245	245
query9	8650	2936	2933	2933
query10	525	345	297	297
query11	15456	15065	14742	14742
query12	182	127	118	118
query13	1675	565	428	428
query14	11545	9174	9201	9174
query15	213	191	172	172
query16	7671	672	460	460
query17	1625	763	625	625
query18	2052	427	340	340
query19	216	198	177	177
query20	132	123	119	119
query21	256	128	113	113
query22	4058	4250	3983	3983
query23	34068	32986	33121	32986
query24	8106	2390	2384	2384
query25	573	508	446	446
query26	1239	306	166	166
query27	2695	518	375	375
query28	4459	2294	2256	2256
query29	764	637	504	504
query30	296	236	200	200
query31	927	783	720	720
query32	90	95	81	81
query33	572	381	384	381
query34	794	858	526	526
query35	812	869	743	743
query36	968	1023	934	934
query37	132	117	95	95
query38	3513	3538	3508	3508
query39	1498	1449	1424	1424
query40	226	140	128	128
query41	67	63	63	63
query42	138	120	119	119
query43	501	522	477	477
query44	1352	883	872	872
query45	180	173	181	173
query46	872	1087	659	659
query47	1751	1797	1739	1739
query48	392	433	336	336
query49	747	503	414	414
query50	664	714	429	429
query51	3983	3885	3925	3885
query52	121	114	109	109
query53	249	278	200	200
query54	623	621	572	572
query55	100	103	95	95
query56	372	376	353	353
query57	1202	1206	1127	1127
query58	309	315	294	294
query59	2582	2693	2594	2594
query60	389	376	395	376
query61	166	158	155	155
query62	834	717	687	687
query63	233	205	201	201
query64	4424	1139	828	828
query65	4038	3965	3994	3965
query66	1107	524	352	352
query67	15729	15208	15290	15208
query68	7983	949	584	584
query69	513	332	303	303
query70	1340	1279	1307	1279
query71	600	363	331	331
query72	5788	5054	5182	5054
query73	690	654	366	366
query74	9113	9095	8919	8919
query75	3748	3280	2790	2790
query76	3460	1155	749	749
query77	817	410	342	342
query78	9687	9758	8891	8891
query79	2139	868	608	608
query80	683	586	530	530
query81	495	331	231	231
query82	255	164	149	149
query83	275	261	247	247
query84	270	121	99	99
query85	846	465	415	415
query86	384	312	319	312
query87	3735	3702	3552	3552
query88	2858	2244	2258	2244
query89	388	324	304	304
query90	1950	227	223	223
query91	160	164	188	164
query92	93	75	72	72
query93	2250	992	653	653
query94	668	420	325	325
query95	412	336	329	329
query96	472	598	284	284
query97	2930	2978	2865	2865
query98	243	216	222	216
query99	1299	1403	1266	1266
Total cold run time: 276608 ms
Total hot run time: 189607 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e01349481886b9377bc8d2b896c78c9ded78b944, data reload: false

query1	0.06	0.06	0.05
query2	0.09	0.05	0.06
query3	0.25	0.08	0.08
query4	1.61	0.12	0.11
query5	0.29	0.28	0.25
query6	1.18	0.65	0.63
query7	0.02	0.03	0.03
query8	0.06	0.04	0.04
query9	0.61	0.54	0.52
query10	0.59	0.58	0.57
query11	0.17	0.12	0.12
query12	0.16	0.12	0.12
query13	0.62	0.62	0.62
query14	1.02	1.03	1.03
query15	0.87	0.86	0.86
query16	0.40	0.40	0.39
query17	1.04	1.05	1.04
query18	0.24	0.20	0.19
query19	1.93	1.86	1.86
query20	0.01	0.01	0.02
query21	15.40	0.94	0.57
query22	0.79	1.08	0.64
query23	15.04	1.39	0.65
query24	6.78	0.81	1.28
query25	0.54	0.18	0.10
query26	0.68	0.15	0.14
query27	0.06	0.05	0.05
query28	9.98	0.94	0.42
query29	12.54	3.95	3.24
query30	0.28	0.13	0.12
query31	2.83	0.60	0.38
query32	3.24	0.56	0.48
query33	3.10	3.16	3.06
query34	16.18	5.49	4.86
query35	4.95	4.92	4.92
query36	0.70	0.52	0.51
query37	0.10	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.14	0.14
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 104.86 s
Total hot run time: 29.89 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 61.66% (267/433) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 10, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit cebcda4 into apache:master Sep 12, 2025
27 of 29 checks passed
morningman pushed a commit that referenced this pull request Oct 9, 2025
### What problem does this PR solve?

This PR extends the OPTIMIZE TABLE framework introduced in #55679 by
implementing additional Iceberg meta procedure actions. Building upon
the foundation established for Iceberg
table optimization, this enhancement adds critical snapshot management
operations that enable more sophisticated Iceberg table maintenance
workflows.


#### New Iceberg Actions Implemented

This PR introduces **5 new Iceberg meta procedure actions**:

1. **`cherrypick_snapshot`** - Cherry-picks changes from a specific
snapshot
2. **`fast_forward`** - Fast-forwards one branch to match another
branch's latest snapshot
3. **`rollback_to_snapshot`** - Rolls back table to a specific snapshot
4. **`rollback_to_timestamp`** - Rolls back table to a specific
timestamp
  5. **`set_current_snapshot`** - Sets a specific snapshot as current

  #### Example Usage
  ```sql
  -- Cherry-pick changes from a snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" =
"123456789");
```
```
  -- Fast-forward branch to match another branch
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" =
"main");
```
```
  -- Rollback to specific snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" =
"987654321");
```

The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating
  dependencies on external tools like Spark SQL for test data preparation.
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
### What problem does this PR solve?

This PR extends the OPTIMIZE TABLE framework introduced in #55679 by
implementing additional Iceberg meta procedure actions. Building upon
the foundation established for Iceberg
table optimization, this enhancement adds critical snapshot management
operations that enable more sophisticated Iceberg table maintenance
workflows.


#### New Iceberg Actions Implemented

This PR introduces **5 new Iceberg meta procedure actions**:

1. **`cherrypick_snapshot`** - Cherry-picks changes from a specific
snapshot
2. **`fast_forward`** - Fast-forwards one branch to match another
branch's latest snapshot
3. **`rollback_to_snapshot`** - Rolls back table to a specific snapshot
4. **`rollback_to_timestamp`** - Rolls back table to a specific
timestamp
  5. **`set_current_snapshot`** - Sets a specific snapshot as current

  #### Example Usage
  ```sql
  -- Cherry-pick changes from a snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" =
"123456789");
```
```
  -- Fast-forward branch to match another branch
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" =
"main");
```
```
  -- Rollback to specific snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" =
"987654321");
```

The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating
  dependencies on external tools like Spark SQL for test data preparation.
morningman pushed a commit that referenced this pull request Oct 10, 2025
…56638)

### What problem does this PR solve?

Issue: #56002
Related: #55679 

This PR transforms the existing OPTIMIZE TABLE syntax to the more
standard ALTER TABLE EXECUTE action syntax. This change provides a
unified interface for table action operations across different table
engines in Apache Doris.

#### New ALTER TABLE EXECUTE Syntax

```sql
ALTER TABLE [catalog.]database.table 
  EXECUTE action("key1" = "value1", "key2" = "value2", ...) 
  [PARTITION (partition_list)]
  [WHERE condition]
```
morningman pushed a commit to apache/doris-website that referenced this pull request Nov 10, 2025
morningman pushed a commit that referenced this pull request Nov 10, 2025
…le optimization and compaction (#56413)

### What problem does this PR solve?

**Issue Number:** #56002

**Related PR:** #55679 #56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

## Feature Description

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

## Usage Example

### Basic Usage

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files();
```

### Custom Parameters

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

### Rewrite with WHERE Conditions

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

### Rewrite All Files

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

### Handle Delete Files

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

## Parameter List

### File Size Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

### Input Files Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

### Delete Files Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

### Output Specification Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

### Parameter Notes

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

## Execution Flow

### Overall Process

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```

### Detailed Steps

#### Step 1: Parameter Validation and Table Retrieval
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters

#### Step 2: File Planning and Grouping (RewriteDataFilePlanner)
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`

#### Step 3: Concurrent Rewrite Execution (RewriteDataFileExecutor)
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion

#### Step 4: Transaction Commit and Result Return
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 10, 2025
…le optimization and compaction (apache#56413)

**Issue Number:** apache#56002

**Related PR:** apache#55679 apache#56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files();
```

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 11, 2025
…pache#56638)

Issue: apache#56002
Related: apache#55679

This PR transforms the existing OPTIMIZE TABLE syntax to the more
standard ALTER TABLE EXECUTE action syntax. This change provides a
unified interface for table action operations across different table
engines in Apache Doris.

```sql
ALTER TABLE [catalog.]database.table
  EXECUTE action("key1" = "value1", "key2" = "value2", ...)
  [PARTITION (partition_list)]
  [WHERE condition]
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 11, 2025
…le optimization and compaction (apache#56413)

**Issue Number:** apache#56002

**Related PR:** apache#55679 apache#56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files();
```

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 12, 2025
…pache#56638)

Issue: apache#56002
Related: apache#55679

This PR transforms the existing OPTIMIZE TABLE syntax to the more
standard ALTER TABLE EXECUTE action syntax. This change provides a
unified interface for table action operations across different table
engines in Apache Doris.

```sql
ALTER TABLE [catalog.]database.table
  EXECUTE action("key1" = "value1", "key2" = "value2", ...)
  [PARTITION (partition_list)]
  [WHERE condition]
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 12, 2025
…le optimization and compaction (apache#56413)

**Issue Number:** apache#56002

**Related PR:** apache#55679 apache#56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files();
```

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 13, 2025
…pache#56638)

Issue: apache#56002
Related: apache#55679

This PR transforms the existing OPTIMIZE TABLE syntax to the more
standard ALTER TABLE EXECUTE action syntax. This change provides a
unified interface for table action operations across different table
engines in Apache Doris.

```sql
ALTER TABLE [catalog.]database.table
  EXECUTE action("key1" = "value1", "key2" = "value2", ...)
  [PARTITION (partition_list)]
  [WHERE condition]
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 13, 2025
…le optimization and compaction (apache#56413)

**Issue Number:** apache#56002

**Related PR:** apache#55679 apache#56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files();
```

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Nov 18, 2025
…le optimization and compaction (apache#56413)

### What problem does this PR solve?

**Issue Number:** apache#56002

**Related PR:** apache#55679 apache#56638

This PR implements the `rewrite_data_files` action for Apache Iceberg
tables in Doris, providing comprehensive table optimization and data
file compaction capabilities. This feature allows users to reorganize
data files to improve query performance, optimize storage efficiency,
and maintain delete files according to Iceberg's official specification.

---

## Feature Description

This PR implements the `rewrite_data_files` operation for Iceberg
tables, providing table optimization and data file compaction
capabilities. The feature follows Iceberg's official `RewriteDataFiles`
specification and provides the following core capabilities:

1. **Data File Compaction**: Merges multiple small files into larger
files, reducing file count and improving query performance
2. **Storage Efficiency Optimization**: Reduces storage overhead through
file reorganization and optimizes data distribution
3. **Delete File Management**: Properly handles and maintains delete
files, reducing filtering overhead during queries
4. **WHERE Condition Support**: Supports rewriting specific data ranges
through WHERE conditions, including various data types (BIGINT, STRING,
INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP, DECIMAL) and complex conditional
expressions
5. **Concurrent Execution**: Supports concurrent execution of multiple
rewrite tasks for improved processing efficiency

After execution, detailed statistics are returned, including:
- `rewritten_data_files_count`: Number of data files that were rewritten
- `added_data_files_count`: Number of new data files generated
- `rewritten_bytes_count`: Number of bytes rewritten
- `removed_delete_files_count`: Number of delete files removed

---

## Usage Example

### Basic Usage

```sql
-- Rewrite data files with default parameters
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files();
```

### Custom Parameters

```sql
-- Specify target file size and minimum input files
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3"
);
```

### Rewrite with WHERE Conditions

```sql
-- Rewrite only data within specific date range
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "104857600",
    "min-input-files" = "3",
    "delete-ratio-threshold" = "0.2"
) WHERE created_date >= '2024-01-01' AND status = 'active';

-- Rewrite data satisfying complex conditions
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "target-file-size-bytes" = "536870912"
) WHERE age > 25 AND salary > 50000.0 AND is_active = true;
```

### Rewrite All Files

```sql
-- Ignore file size limits and rewrite all files
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files("rewrite-all" = "true");
```

### Handle Delete Files

```sql
-- Trigger rewrite when delete file count or ratio exceeds threshold
ALTER TABLE iceberg_catalog.db.table 
EXECUTE rewrite_data_files(
    "delete-file-threshold" = "10",
    "delete-ratio-threshold" = "0.3"
);
```

---

## Parameter List

### File Size Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `target-file-size-bytes` | Long | 536870912 (512MB) | Target size in
bytes for output files |
| `min-file-size-bytes` | Long | 0 (auto-calculated as 75% of target) |
Minimum file size in bytes for files to be rewritten |
| `max-file-size-bytes` | Long | 0 (auto-calculated as 180% of target) |
Maximum file size in bytes for files to be rewritten |

### Input Files Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `min-input-files` | Int | 5 | Minimum number of input files to rewrite
together |
| `rewrite-all` | Boolean | false | Whether to rewrite all files
regardless of size |
| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum
size in bytes for a file group to be rewritten |

### Delete Files Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `delete-file-threshold` | Int | Integer.MAX_VALUE | Minimum number of
delete files to trigger rewrite |
| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete
records to total records to trigger rewrite (0.0-1.0) |

### Output Specification Parameters

| Parameter Name | Type | Default Value | Description |
|----------------|------|---------------|-------------|
| `output-spec-id` | Long | 2 | Partition specification ID for output
files |

### Parameter Notes

- If `min-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 0.75`
- If `max-file-size-bytes` is not specified, default value is
`target-file-size-bytes * 1.8`
- File groups are only rewritten when they meet the `min-input-files`
condition
- `delete-file-threshold` and `delete-ratio-threshold` are used to
determine if rewrite is needed to handle delete files

---

## Execution Flow

### Overall Process

```
1. Parameter Validation and Table Retrieval
   ├─ Validate rewrite parameters
   ├─ Get Iceberg table reference
   └─ Check if table has data snapshots

2. File Planning and Grouping
   ├─ Use RewriteDataFilePlanner to plan file scan tasks
   ├─ Filter file scan tasks based on WHERE conditions
   ├─ Organize file groups by partition and size constraints
   └─ Filter file groups that don't meet rewrite conditions

3. Concurrent Rewrite Execution
   ├─ Create RewriteDataFileExecutor
   ├─ Execute multiple file group rewrite tasks concurrently
   ├─ Each task executes INSERT-SELECT statements
   └─ Wait for all tasks to complete

4. Transaction Commit and Result Return
   ├─ Commit transaction and create new snapshot
   ├─ Update table metadata
   └─ Return detailed execution result statistics
```

### Detailed Steps

#### Step 1: Parameter Validation and Table Retrieval
- Validate all parameters for validity and value ranges
- If table has no snapshots, return empty result directly
- Calculate default values for `min-file-size-bytes` and
`max-file-size-bytes` based on parameters

#### Step 2: File Planning and Grouping (RewriteDataFilePlanner)
- **File Scanning**: Build `TableScan` based on WHERE conditions to get
qualified `FileScanTask`
- **File Filtering**: Filter files based on `min-file-size-bytes`,
`max-file-size-bytes`, and `rewrite-all` parameters
- **Partition Grouping**: Group files into `RewriteDataGroup` by
partition specification
- **Size Constraints**: Ensure each file group doesn't exceed
`max-file-group-size-bytes`
- **Delete File Check**: Determine if rewrite is needed based on
`delete-file-threshold` and `delete-ratio-threshold`

#### Step 3: Concurrent Rewrite Execution (RewriteDataFileExecutor)
- **Task Creation**: Create `RewriteGroupTask` for each
`RewriteDataGroup`
- **Concurrent Execution**: Use thread pool to execute multiple rewrite
tasks concurrently
- **Data Writing**: Each task executes `INSERT INTO ... SELECT FROM ...`
statements to write data to new files
- **Progress Tracking**: Use atomic counters and `CountDownLatch` to
track task completion

#### Step 4: Transaction Commit and Result Return
- **Transaction Management**: Use `IcebergTransaction` to manage
transactions, ensuring atomicity
- **Metadata Update**: Commit transaction to create new snapshot and
update table metadata
- **Result Statistics**: Aggregate execution results from all tasks and
return statistics
@suxiaogang223 suxiaogang223 deleted the support_optimize_gram branch January 17, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants