Skip to content

[feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan#58759

Merged
morningman merged 5 commits intoapache:masterfrom
suxiaogang223:max_external_split_num
Jan 29, 2026
Merged

[feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan#58759
morningman merged 5 commits intoapache:masterfrom
suxiaogang223:max_external_split_num

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Dec 5, 2025

What problem does this PR solve?

Problem Summary

When querying external table catalog (Hive, Iceberg, Paimon, etc.), Doris splits files into multiple splits for parallel processing. In some cases, especially with numerous small files, this can generate an excessive number of splits, potentially causing:

  1. Memory pressure: Too many splits consume significant memory in FE
  2. OOM issues: Excessive split generation can lead to OutOfMemoryError
  3. Performance degradation: Managing too many splits impacts query planning overhead

Previously, there was no upper limit on the number of splits in non-batch mode, which could lead to problems when querying tables with many small files.

Solution

This PR introduces a new session variable max_file_split_num to limit the maximum number of splits allowed per table scan in non-batch mode.

Changes

  1. New Session Variable: max_file_split_num

    • Type: int
    • Default: 100000
    • Description: "在非 batch 模式下,每个 table scan 最大允许的 split 数量,防止产生过多 split 导致 OOM。"
    • Forward to BE: true
  2. Implementation in FileQueryScanNode:

    • Added method applyMaxFileSplitNumLimit(long targetSplitSize, long totalFileSize)
    • Dynamically calculates minimum split size to ensure split count doesn't exceed the limit
    • Formula: minSplitSizeForMaxNum = (totalFileSize + maxFileSplitNum - 1) / maxFileSplitNum
    • Returns: Math.max(targetSplitSize, minSplitSizeForMaxNum)
  3. Applied to multiple scan nodes:

    • HiveScanNode
    • IcebergScanNode
    • PaimonScanNode
    • TVFScanNode
  4. Unit Tests:

    • FileQueryScanNodeTest: Test base logic
    • HiveScanNodeTest: Test Hive-specific implementation
    • IcebergScanNodeTest: Test Iceberg-specific implementation
    • PaimonScanNodeTest: Test Paimon-specific implementation
    • TVFScanNodeTest: Test TVF-specific implementation

Usage

Users can now control the maximum number of splits per table scan by setting the session variable:

-- Set to 50000 splits maximum
SET max_file_split_num = 50000;

-- Disable the limit (set to 0 or negative)
SET max_file_split_num = 0;

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 5, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

2 similar comments
@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34570 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dd684a2f23bf54f84260040a028e891620b229bc, data reload: false

------ Round 1 ----------------------------------
q1	17610	5044	4873	4873
q2	2043	314	192	192
q3	10273	1291	716	716
q4	10202	803	306	306
q5	7527	2403	2154	2154
q6	179	163	133	133
q7	967	790	632	632
q8	9338	1405	1070	1070
q9	7007	5310	5277	5277
q10	6832	2195	1791	1791
q11	520	310	294	294
q12	339	367	230	230
q13	17790	3681	3093	3093
q14	238	237	215	215
q15	598	524	517	517
q16	893	883	814	814
q17	669	796	486	486
q18	7441	7179	7739	7179
q19	1398	995	621	621
q20	420	371	245	245
q21	4215	4035	2691	2691
q22	1106	1096	1041	1041
Total cold run time: 107605 ms
Total hot run time: 34570 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5262	5095	5165	5095
q2	337	417	323	323
q3	2337	2821	2513	2513
q4	1517	1874	1493	1493
q5	4584	4438	4445	4438
q6	218	167	125	125
q7	1957	1967	1845	1845
q8	2632	2605	2514	2514
q9	7602	7512	7154	7154
q10	2948	3097	2625	2625
q11	562	497	472	472
q12	619	694	549	549
q13	3236	3624	3009	3009
q14	261	287	285	285
q15	536	512	499	499
q16	879	913	860	860
q17	1112	1341	1321	1321
q18	7236	7270	6926	6926
q19	829	790	815	790
q20	1959	1967	1839	1839
q21	4675	4222	4230	4222
q22	1076	1062	965	965
Total cold run time: 52374 ms
Total hot run time: 49862 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180264 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dd684a2f23bf54f84260040a028e891620b229bc, data reload: false

query5	5156	654	509	509
query6	338	256	211	211
query7	4660	467	287	287
query8	312	269	237	237
query9	8747	2628	2610	2610
query10	556	308	285	285
query11	15616	15050	14626	14626
query12	176	116	116	116
query13	1676	498	396	396
query14	6179	3317	3057	3057
query14_1	2906	2965	2923	2923
query15	212	200	188	188
query16	7720	485	484	484
query17	1191	693	584	584
query18	2035	426	321	321
query19	204	190	157	157
query20	130	123	116	116
query21	215	132	110	110
query22	3892	3993	3857	3857
query23	16524	16169	15921	15921
query23_1	16214	16191	15912	15912
query24	7174	1641	1198	1198
query24_1	1202	1184	1222	1184
query25	611	477	414	414
query26	1248	289	208	208
query27	2878	466	304	304
query28	4379	2167	2160	2160
query29	813	549	453	453
query30	313	242	221	221
query31	810	689	620	620
query32	81	74	69	69
query33	653	347	297	297
query34	873	883	547	547
query35	808	832	733	733
query36	895	903	816	816
query37	122	90	84	84
query38	3840	3936	3754	3754
query39	760	741	734	734
query39_1	709	692	701	692
query40	221	130	118	118
query41	67	62	63	62
query42	129	100	96	96
query43	445	417	394	394
query44	1320	772	758	758
query45	195	190	186	186
query46	893	970	595	595
query47	1698	1725	1625	1625
query48	408	319	235	235
query49	797	430	347	347
query50	702	317	231	231
query51	3845	3894	3978	3894
query52	119	100	89	89
query53	233	235	179	179
query54	338	266	249	249
query55	100	80	76	76
query56	340	303	301	301
query57	1164	1129	1133	1129
query58	302	264	255	255
query59	2285	2355	2342	2342
query60	372	315	305	305
query61	194	193	190	190
query62	786	684	652	652
query63	237	180	184	180
query64	4675	1309	897	897
query65	4039	3945	3961	3945
query66	1172	447	342	342
query67	15499	15048	14966	14966
query68	4744	965	672	672
query69	526	295	271	271
query70	1122	1005	983	983
query71	431	297	280	280
query72	5981	4923	5114	4923
query73	699	581	303	303
query74	8613	8888	8763	8763
query75	3026	3026	2567	2567
query76	3296	1127	738	738
query77	531	397	308	308
query78	9501	9643	8835	8835
query79	1547	849	591	591
query80	1710	545	457	457
query81	545	270	244	244
query82	412	128	101	101
query83	385	272	259	259
query84	254	120	97	97
query85	964	512	454	454
query86	381	308	284	284
query87	4015	4068	3887	3887
query88	2943	2122	2124	2122
query89	398	326	283	283
query90	1840	175	174	174
query91	178	167	141	141
query92	73	67	66	66
query93	1191	1031	687	687
query94	767	285	281	281
query95	566	341	326	326
query96	549	489	212	212
query97	2621	2671	2632	2632
query98	241	203	195	195
query99	1321	1333	1227	1227
Total cold run time: 266819 ms
Total hot run time: 180264 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dd684a2f23bf54f84260040a028e891620b229bc, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.04
query3	0.25	0.09	0.09
query4	1.61	0.12	0.11
query5	0.27	0.24	0.27
query6	1.16	0.64	0.62
query7	0.03	0.02	0.03
query8	0.05	0.04	0.05
query9	0.57	0.51	0.50
query10	0.56	0.56	0.57
query11	0.15	0.10	0.11
query12	0.16	0.11	0.12
query13	0.64	0.59	0.60
query14	0.99	0.98	0.98
query15	0.82	0.80	0.80
query16	0.40	0.39	0.39
query17	1.07	1.01	1.04
query18	0.24	0.21	0.22
query19	1.91	1.81	1.74
query20	0.02	0.01	0.02
query21	15.46	0.28	0.14
query22	4.84	0.05	0.05
query23	16.03	0.29	0.10
query24	1.63	0.63	0.18
query25	0.07	0.05	0.06
query26	0.14	0.13	0.13
query27	0.06	0.05	0.04
query28	4.27	1.21	1.02
query29	12.57	3.99	3.26
query30	0.27	0.14	0.11
query31	2.81	0.61	0.40
query32	3.22	0.56	0.46
query33	3.04	3.09	3.12
query34	16.83	5.19	4.56
query35	4.60	4.55	4.58
query36	0.64	0.49	0.50
query37	0.10	0.06	0.07
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.15	0.13
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 98.08 s
Total hot run time: 27.19 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34415 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 906053626d692934eef5a18e9be90e03cf87a863, data reload: false

------ Round 1 ----------------------------------
q1	17625	4998	4889	4889
q2	2043	316	197	197
q3	10237	1314	733	733
q4	10207	800	328	328
q5	7530	2465	2152	2152
q6	208	172	135	135
q7	966	775	637	637
q8	9356	1399	1098	1098
q9	7064	5324	5383	5324
q10	6812	2215	1830	1830
q11	508	329	290	290
q12	352	377	226	226
q13	17766	3700	3057	3057
q14	234	235	212	212
q15	574	522	517	517
q16	885	867	807	807
q17	684	800	519	519
q18	7391	7078	7130	7078
q19	961	971	597	597
q20	380	348	219	219
q21	3979	3522	2600	2600
q22	1032	992	970	970
Total cold run time: 106794 ms
Total hot run time: 34415 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4953	4911	4921	4911
q2	336	418	323	323
q3	2126	2716	2286	2286
q4	1322	1776	1285	1285
q5	4265	4821	4660	4660
q6	219	175	126	126
q7	2050	2003	1814	1814
q8	2664	2435	2559	2435
q9	7814	7599	7697	7599
q10	3028	3234	2875	2875
q11	618	507	501	501
q12	709	725	608	608
q13	3589	3954	3391	3391
q14	277	300	269	269
q15	552	502	498	498
q16	884	940	887	887
q17	1175	1414	1451	1414
q18	8149	7764	7702	7702
q19	859	894	916	894
q20	2045	2109	1932	1932
q21	4984	4421	4169	4169
q22	1085	1031	997	997
Total cold run time: 53703 ms
Total hot run time: 51576 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180789 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 906053626d692934eef5a18e9be90e03cf87a863, data reload: false

query5	5139	663	501	501
query6	346	232	211	211
query7	4668	480	298	298
query8	308	270	245	245
query9	8735	2672	2666	2666
query10	578	332	274	274
query11	15345	14804	14796	14796
query12	197	123	123	123
query13	1687	470	369	369
query14	6320	3334	3053	3053
query14_1	2932	2908	2950	2908
query15	215	198	185	185
query16	7681	500	456	456
query17	1220	739	613	613
query18	2080	433	348	348
query19	227	193	170	170
query20	136	125	122	122
query21	225	137	115	115
query22	4095	3942	3835	3835
query23	16681	16288	16084	16084
query23_1	16118	16036	16132	16036
query24	7294	1639	1220	1220
query24_1	1247	1233	1219	1219
query25	658	512	454	454
query26	1273	310	179	179
query27	2876	490	326	326
query28	4403	2212	2203	2203
query29	848	623	436	436
query30	319	248	215	215
query31	844	699	620	620
query32	78	67	73	67
query33	668	342	296	296
query34	869	914	549	549
query35	800	824	725	725
query36	912	923	839	839
query37	124	91	79	79
query38	3841	3814	3809	3809
query39	749	755	720	720
query39_1	695	708	712	708
query40	228	132	116	116
query41	66	61	59	59
query42	128	99	97	97
query43	441	428	390	390
query44	1306	755	773	755
query45	201	191	185	185
query46	927	962	595	595
query47	1704	1715	1613	1613
query48	417	325	229	229
query49	772	436	376	376
query50	697	305	243	243
query51	3891	3863	3927	3863
query52	127	98	87	87
query53	239	232	177	177
query54	308	272	246	246
query55	105	81	80	80
query56	330	297	295	295
query57	1160	1152	1111	1111
query58	333	275	256	256
query59	2285	2347	2314	2314
query60	354	312	307	307
query61	164	154	157	154
query62	802	677	650	650
query63	234	178	176	176
query64	4535	1179	897	897
query65	4068	3991	4023	3991
query66	1208	444	339	339
query67	15538	14882	14805	14805
query68	8331	946	669	669
query69	523	299	265	265
query70	1124	994	961	961
query71	428	296	265	265
query72	5930	4954	4872	4872
query73	680	548	300	300
query74	8544	8848	8761	8761
query75	3034	3048	2516	2516
query76	3363	1126	745	745
query77	536	391	308	308
query78	9447	9646	8897	8897
query79	1283	870	584	584
query80	660	562	493	493
query81	521	268	243	243
query82	217	130	106	106
query83	283	275	260	260
query84	281	129	95	95
query85	905	496	456	456
query86	377	308	294	294
query87	4047	4085	4009	4009
query88	2948	2184	2116	2116
query89	394	325	282	282
query90	2066	166	160	160
query91	184	167	138	138
query92	89	69	66	66
query93	1693	1065	679	679
query94	774	309	287	287
query95	575	390	344	344
query96	545	479	212	212
query97	2631	2711	2576	2576
query98	247	210	207	207
query99	1357	1319	1203	1203
Total cold run time: 269955 ms
Total hot run time: 180789 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 906053626d692934eef5a18e9be90e03cf87a863, data reload: false

query1	0.05	0.05	0.05
query2	0.11	0.04	0.05
query3	0.26	0.10	0.09
query4	1.61	0.11	0.11
query5	0.29	0.26	0.25
query6	1.16	0.65	0.64
query7	0.03	0.03	0.03
query8	0.06	0.05	0.04
query9	0.58	0.52	0.51
query10	0.56	0.55	0.57
query11	0.16	0.11	0.12
query12	0.15	0.11	0.12
query13	0.62	0.60	0.61
query14	1.00	0.98	0.98
query15	0.81	0.80	0.79
query16	0.40	0.39	0.39
query17	1.02	1.02	1.01
query18	0.22	0.21	0.22
query19	1.95	1.91	1.87
query20	0.02	0.01	0.02
query21	15.46	0.31	0.13
query22	4.68	0.05	0.05
query23	16.12	0.27	0.11
query24	0.99	0.62	0.47
query25	0.09	0.06	0.06
query26	0.13	0.14	0.13
query27	0.05	0.05	0.05
query28	3.86	1.23	1.03
query29	12.58	4.10	3.31
query30	0.28	0.14	0.12
query31	2.82	0.64	0.40
query32	3.23	0.57	0.46
query33	2.98	3.07	3.13
query34	16.93	5.17	4.44
query35	4.54	4.57	4.53
query36	0.66	0.51	0.48
query37	0.11	0.07	0.07
query38	0.07	0.04	0.03
query39	0.05	0.03	0.02
query40	0.17	0.14	0.13
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.04	0.04
Total cold run time: 97.02 s
Total hot run time: 27.57 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 37.50% (36/96) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 67.71% (65/96) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223 suxiaogang223 force-pushed the max_external_split_num branch from 9060536 to 7c2c774 Compare January 22, 2026 03:15
@suxiaogang223 suxiaogang223 force-pushed the max_external_split_num branch from 54bf9ca to 10f5780 Compare January 22, 2026 08:07
@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223 suxiaogang223 changed the title [Feature](multi-catalog) Add max_file_splits_num to prevent OOM when file_split_size is too small [Feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan Jan 22, 2026
@suxiaogang223 suxiaogang223 changed the title [Feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan Jan 22, 2026
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30962 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a616fb07c2b28884d896c1b2c550d1fd9ce8c8bc, data reload: false

------ Round 1 ----------------------------------
q1	17628	4804	4570	4570
q2	2013	328	189	189
q3	10231	1286	725	725
q4	10214	830	304	304
q5	7548	2080	1803	1803
q6	186	170	141	141
q7	856	727	567	567
q8	9272	1351	1056	1056
q9	4893	4696	4472	4472
q10	6777	1665	1225	1225
q11	530	308	288	288
q12	338	367	219	219
q13	17780	3806	3081	3081
q14	237	237	208	208
q15	601	533	528	528
q16	649	648	583	583
q17	654	825	468	468
q18	6589	6471	6683	6471
q19	1331	1041	692	692
q20	453	358	261	261
q21	2933	2251	2036	2036
q22	1113	1075	1090	1075
Total cold run time: 102826 ms
Total hot run time: 30962 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4962	4926	4937	4926
q2	339	395	331	331
q3	2366	2980	2529	2529
q4	1580	1951	1444	1444
q5	4571	4571	4271	4271
q6	225	170	125	125
q7	2039	1970	1753	1753
q8	2540	2382	2343	2343
q9	7078	7347	7066	7066
q10	2589	2959	2243	2243
q11	542	478	476	476
q12	698	776	631	631
q13	3567	3850	3079	3079
q14	266	283	265	265
q15	539	511	499	499
q16	626	658	609	609
q17	1065	1215	1242	1215
q18	7342	7138	7353	7138
q19	846	786	801	786
q20	1900	1960	1821	1821
q21	4564	4195	4014	4014
q22	1061	1037	992	992
Total cold run time: 51305 ms
Total hot run time: 48556 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172276 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a616fb07c2b28884d896c1b2c550d1fd9ce8c8bc, data reload: false

query5	4409	630	505	505
query6	337	228	205	205
query7	4231	464	255	255
query8	342	273	253	253
query9	8698	2843	2868	2843
query10	473	327	303	303
query11	15179	15036	14880	14880
query12	184	122	116	116
query13	1258	470	391	391
query14	6259	3035	2783	2783
query14_1	2671	2647	2634	2634
query15	208	195	174	174
query16	979	489	494	489
query17	1099	679	603	603
query18	2477	424	333	333
query19	193	180	156	156
query20	121	121	117	117
query21	217	139	123	123
query22	3783	4050	3854	3854
query23	15902	15562	15415	15415
query23_1	15478	15415	15504	15415
query24	7107	1550	1173	1173
query24_1	1151	1162	1172	1162
query25	557	473	413	413
query26	1241	273	157	157
query27	2757	449	282	282
query28	4538	2160	2141	2141
query29	771	554	465	465
query30	307	243	205	205
query31	795	635	563	563
query32	88	81	77	77
query33	541	372	333	333
query34	900	868	542	542
query35	720	773	672	672
query36	868	917	832	832
query37	137	99	90	90
query38	2716	2755	2632	2632
query39	784	758	737	737
query39_1	708	719	719	719
query40	221	137	123	123
query41	73	68	68	68
query42	97	96	97	96
query43	467	447	392	392
query44	1330	756	767	756
query45	196	187	184	184
query46	848	948	590	590
query47	1425	1505	1421	1421
query48	306	327	237	237
query49	602	418	341	341
query50	672	272	199	199
query51	3827	3781	3756	3756
query52	96	90	79	79
query53	205	226	177	177
query54	281	256	242	242
query55	82	81	77	77
query56	303	299	306	299
query57	994	966	909	909
query58	267	257	257	257
query59	2107	2181	2149	2149
query60	333	332	318	318
query61	150	147	143	143
query62	383	353	302	302
query63	198	164	159	159
query64	4825	1147	827	827
query65	3772	3711	3746	3711
query66	1471	414	324	324
query67	15323	15600	15347	15347
query68	2546	1074	720	720
query69	406	310	267	267
query70	1003	911	856	856
query71	309	294	273	273
query72	5309	3079	3136	3079
query73	605	737	318	318
query74	8679	8737	8503	8503
query75	2304	2304	1863	1863
query76	2272	1057	659	659
query77	352	384	304	304
query78	9587	9882	9129	9129
query79	1062	911	575	575
query80	1275	535	454	454
query81	541	262	231	231
query82	1037	157	115	115
query83	319	258	238	238
query84	257	124	103	103
query85	892	472	408	408
query86	432	339	299	299
query87	2868	2870	2705	2705
query88	3516	2570	2548	2548
query89	305	258	244	244
query90	1984	185	169	169
query91	162	159	131	131
query92	78	73	73	73
query93	1148	1017	662	662
query94	639	286	266	266
query95	578	339	384	339
query96	631	508	231	231
query97	2372	2338	2294	2294
query98	209	206	198	198
query99	602	583	576	576
Total cold run time: 245762 ms
Total hot run time: 172276 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a616fb07c2b28884d896c1b2c550d1fd9ce8c8bc, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.08
query4	1.61	0.12	0.10
query5	0.28	0.24	0.26
query6	1.14	0.66	0.64
query7	0.04	0.03	0.02
query8	0.05	0.04	0.04
query9	0.56	0.49	0.50
query10	0.56	0.57	0.54
query11	0.14	0.09	0.10
query12	0.13	0.10	0.11
query13	0.60	0.58	0.58
query14	0.94	0.93	0.94
query15	0.81	0.79	0.78
query16	0.38	0.42	0.38
query17	0.95	1.04	1.00
query18	0.24	0.22	0.22
query19	1.96	1.90	1.86
query20	0.02	0.01	0.01
query21	15.44	0.26	0.15
query22	5.21	0.05	0.04
query23	15.94	0.29	0.10
query24	1.23	0.68	0.79
query25	0.09	0.05	0.06
query26	0.14	0.13	0.14
query27	0.06	0.06	0.05
query28	4.05	1.06	0.88
query29	12.54	3.92	3.10
query30	0.28	0.13	0.11
query31	2.82	0.64	0.40
query32	3.24	0.57	0.46
query33	3.07	3.02	3.03
query34	16.26	5.06	4.45
query35	4.42	4.45	4.41
query36	0.65	0.50	0.49
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.03	0.02
query40	0.17	0.14	0.13
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 96.83 s
Total hot run time: 27.07 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 24.24% (8/33) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30690 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5401b3d51b5cf1d302dc2ba87cb535c614a2bc9a, data reload: false

------ Round 1 ----------------------------------
q1	17701	4747	4608	4608
q2	2163	312	197	197
q3	10257	1256	727	727
q4	10233	888	319	319
q5	7531	2096	1795	1795
q6	189	170	138	138
q7	882	692	586	586
q8	9248	1323	1148	1148
q9	4924	4627	4489	4489
q10	7063	1646	1240	1240
q11	568	300	272	272
q12	373	367	219	219
q13	17793	3798	3158	3158
q14	240	234	224	224
q15	599	567	529	529
q16	650	637	583	583
q17	677	799	477	477
q18	6640	6284	6396	6284
q19	1591	973	613	613
q20	400	349	236	236
q21	2583	2158	1897	1897
q22	1014	984	951	951
Total cold run time: 103319 ms
Total hot run time: 30690 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4799	6412	4694	4694
q2	319	418	326	326
q3	2217	2680	2317	2317
q4	1326	1828	1326	1326
q5	4233	4051	4168	4051
q6	208	171	134	134
q7	1972	2219	1911	1911
q8	2602	2512	2498	2498
q9	7135	7245	7206	7206
q10	2535	2680	2324	2324
q11	562	493	453	453
q12	696	768	596	596
q13	3603	4061	3524	3524
q14	307	299	278	278
q15	549	554	543	543
q16	667	737	643	643
q17	1163	1331	1405	1331
q18	7935	7776	7751	7751
q19	908	832	784	784
q20	2023	2110	1939	1939
q21	4714	4398	4371	4371
q22	1083	1030	972	972
Total cold run time: 51556 ms
Total hot run time: 49972 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172452 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5401b3d51b5cf1d302dc2ba87cb535c614a2bc9a, data reload: false

query5	4383	620	483	483
query6	332	226	207	207
query7	4243	447	273	273
query8	364	258	265	258
query9	8736	2901	2895	2895
query10	408	315	290	290
query11	15095	15151	14816	14816
query12	175	121	116	116
query13	1259	471	382	382
query14	6297	3056	2797	2797
query14_1	2688	2685	2629	2629
query15	208	193	176	176
query16	1091	490	469	469
query17	1137	651	550	550
query18	2413	438	343	343
query19	187	175	143	143
query20	121	116	118	116
query21	287	132	119	119
query22	3895	3929	3913	3913
query23	16037	15718	15209	15209
query23_1	15467	15497	15456	15456
query24	7156	1638	1151	1151
query24_1	1167	1158	1173	1158
query25	511	439	402	402
query26	1296	255	149	149
query27	2730	445	274	274
query28	4531	2155	2147	2147
query29	758	522	435	435
query30	342	234	203	203
query31	743	625	561	561
query32	85	75	71	71
query33	532	356	320	320
query34	877	856	543	543
query35	721	756	681	681
query36	857	911	820	820
query37	170	95	119	95
query38	2734	2678	2633	2633
query39	790	760	716	716
query39_1	705	710	700	700
query40	227	134	116	116
query41	66	62	61	61
query42	99	93	94	93
query43	421	439	423	423
query44	1332	746	747	746
query45	189	195	178	178
query46	831	957	572	572
query47	1442	1471	1272	1272
query48	322	325	234	234
query49	609	415	347	347
query50	683	271	203	203
query51	3844	3873	3778	3778
query52	93	90	83	83
query53	211	216	171	171
query54	285	254	267	254
query55	91	81	76	76
query56	299	292	293	292
query57	1037	1025	897	897
query58	280	259	264	259
query59	2110	2081	2037	2037
query60	332	324	308	308
query61	146	148	142	142
query62	414	348	307	307
query63	191	164	163	163
query64	4808	1161	831	831
query65	3865	3779	3733	3733
query66	1452	413	317	317
query67	15448	15540	15397	15397
query68	2423	1060	718	718
query69	391	308	294	294
query70	1018	948	906	906
query71	311	303	268	268
query72	5274	3149	3232	3149
query73	612	737	312	312
query74	8662	8719	8565	8565
query75	2278	2362	1873	1873
query76	2275	1062	671	671
query77	382	383	312	312
query78	9756	9812	9093	9093
query79	1042	882	575	575
query80	650	550	483	483
query81	482	259	234	234
query82	1409	158	121	121
query83	367	274	249	249
query84	277	119	97	97
query85	885	515	399	399
query86	343	296	291	291
query87	2857	2881	2756	2756
query88	3497	2563	2544	2544
query89	301	266	242	242
query90	1836	179	172	172
query91	167	168	134	134
query92	77	75	69	69
query93	1046	1038	658	658
query94	471	335	294	294
query95	574	332	371	332
query96	637	506	224	224
query97	2344	2362	2347	2347
query98	217	200	201	200
query99	581	558	551	551
Total cold run time: 245426 ms
Total hot run time: 172452 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5401b3d51b5cf1d302dc2ba87cb535c614a2bc9a, data reload: false

query1	0.05	0.04	0.04
query2	0.14	0.05	0.05
query3	0.27	0.09	0.08
query4	1.61	0.12	0.11
query5	0.28	0.24	0.27
query6	1.15	0.66	0.66
query7	0.03	0.03	0.02
query8	0.05	0.04	0.03
query9	0.57	0.50	0.51
query10	0.54	0.55	0.54
query11	0.15	0.10	0.10
query12	0.14	0.11	0.12
query13	0.60	0.58	0.58
query14	0.95	0.94	0.94
query15	0.79	0.78	0.78
query16	0.38	0.40	0.39
query17	1.04	1.07	0.98
query18	0.23	0.21	0.22
query19	1.97	1.87	1.87
query20	0.02	0.02	0.01
query21	15.44	0.28	0.13
query22	5.23	0.06	0.05
query23	16.07	0.28	0.11
query24	3.16	0.75	0.61
query25	0.10	0.07	0.08
query26	0.13	0.13	0.13
query27	0.08	0.04	0.05
query28	5.04	1.06	0.88
query29	12.57	3.92	3.15
query30	0.29	0.15	0.12
query31	2.82	0.62	0.40
query32	3.24	0.57	0.45
query33	3.06	3.04	3.14
query34	16.21	5.06	4.43
query35	4.45	4.41	4.41
query36	0.66	0.49	0.49
query37	0.12	0.07	0.06
query38	0.06	0.04	0.04
query39	0.06	0.03	0.03
query40	0.17	0.14	0.13
query41	0.10	0.03	0.03
query42	0.06	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 100.12 s
Total hot run time: 27.11 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 45.45% (15/33) 🎉
Increment coverage report
Complete coverage report

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 39.39% (13/33) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 29, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 3e5a70f into apache:master Jan 29, 2026
32 of 33 checks passed
@suxiaogang223 suxiaogang223 deleted the max_external_split_num branch February 13, 2026 02:20
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Feb 13, 2026
…revent OOM in file scan (apache#58759)

### What problem does this PR solve?

- Relate Pr: apache#58858

## Problem Summary

When querying external table catalog (Hive, Iceberg, Paimon, etc.),
Doris splits files into multiple splits for parallel processing. In some
cases, especially with numerous small files, this can generate an
excessive number of splits, potentially causing:

1. **Memory pressure**: Too many splits consume significant memory in FE
2. **OOM issues**: Excessive split generation can lead to
OutOfMemoryError
3. **Performance degradation**: Managing too many splits impacts query
planning overhead

Previously, there was no upper limit on the number of splits in
non-batch mode, which could lead to problems when querying tables with
many small files.

## Solution

This PR introduces a new session variable `max_file_split_num` to limit
the maximum number of splits allowed per table scan in non-batch mode.

### Changes

1. **New Session Variable**: `max_file_split_num`
   - Type: `int`
   - Default: `100000`
- Description: "在非 batch 模式下,每个 table scan 最大允许的 split 数量,防止产生过多 split
导致 OOM。"
   - Forward to BE: `true`

2. **Implementation in FileQueryScanNode**:
- Added method `applyMaxFileSplitNumLimit(long targetSplitSize, long
totalFileSize)`
- Dynamically calculates minimum split size to ensure split count
doesn't exceed the limit
- Formula: `minSplitSizeForMaxNum = (totalFileSize + maxFileSplitNum -
1) / maxFileSplitNum`
   - Returns: `Math.max(targetSplitSize, minSplitSizeForMaxNum)`

3. **Applied to multiple scan nodes**:
   - `HiveScanNode`
   - `IcebergScanNode`
   - `PaimonScanNode`
   - `TVFScanNode`

4. **Unit Tests**:
   - `FileQueryScanNodeTest`: Test base logic
   - `HiveScanNodeTest`: Test Hive-specific implementation
   - `IcebergScanNodeTest`: Test Iceberg-specific implementation
   - `PaimonScanNodeTest`: Test Paimon-specific implementation
   - `TVFScanNodeTest`: Test TVF-specific implementation

## Usage

Users can now control the maximum number of splits per table scan by
setting the session variable:

```sql
-- Set to 50000 splits maximum
SET max_file_split_num = 50000;

-- Disable the limit (set to 0 or negative)
SET max_file_split_num = 0;
```

(cherry picked from commit 3e5a70f)
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Feb 13, 2026
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Feb 13, 2026
yiguolei pushed a commit that referenced this pull request Feb 14, 2026
…ariable to prevent OOM in file scan #58759 (#60732)

- Cherry-picked from #58759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants