Skip to content

[opt](ann index) Make chunk size of index train configurable#58645

Merged
airborne12 merged 1 commit intoapache:masterfrom
zhiqiang-hhhh:opt-index-build
Dec 4, 2025
Merged

[opt](ann index) Make chunk size of index train configurable#58645
airborne12 merged 1 commit intoapache:masterfrom
zhiqiang-hhhh:opt-index-build

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Dec 2, 2025

What problem does this PR solve?

Previous pr: #57623

The current granularity for index training and data ingestion is set to 1M and is hard-coded, which makes index construction unnecessarily slow in some scenarios. This should be made configurable and reduced when appropriate.

For example, when having 1M vectors to add, and batch size of stream load is set to 0.3M, this means we will have 3 stream load requests. If it happens to make one request that having 0.3M to have 1 threads for adding, whole process of load will be very slow. A typical cpu usage will be like this:
image

We need to make batch size configurable so that we can modify them when we need to do it.

For example, when we set batch size to 30K, we can have a more higher avg cpu usage when we like this:
image

Default value is still 1M, small batch size will do a damage to the recall of the hnsw.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34684 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a65427fa2d6beaca8002d6ddff78ade3a65ae701, data reload: false

------ Round 1 ----------------------------------
q1	17704	5216	4979	4979
q2	2040	326	211	211
q3	10249	1331	767	767
q4	10232	897	317	317
q5	7548	2451	2260	2260
q6	187	166	137	137
q7	1010	788	641	641
q8	9363	1472	1134	1134
q9	7094	5388	5332	5332
q10	6868	2211	1775	1775
q11	522	313	290	290
q12	342	375	224	224
q13	17788	3713	3121	3121
q14	234	227	211	211
q15	607	529	520	520
q16	900	858	820	820
q17	713	814	567	567
q18	8009	7270	6928	6928
q19	1107	975	629	629
q20	357	355	234	234
q21	4063	3889	2627	2627
q22	1045	1038	960	960
Total cold run time: 107982 ms
Total hot run time: 34684 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5097	5054	5018	5018
q2	341	417	317	317
q3	2132	2689	2289	2289
q4	1350	1757	1318	1318
q5	4398	4558	4511	4511
q6	225	172	130	130
q7	2072	1998	1822	1822
q8	2755	2658	2598	2598
q9	7566	7594	7629	7594
q10	3063	3255	2816	2816
q11	611	535	487	487
q12	669	738	610	610
q13	3528	3946	3449	3449
q14	289	291	279	279
q15	576	522	521	521
q16	907	901	882	882
q17	1276	1526	1420	1420
q18	7983	7834	7547	7547
q19	999	946	902	902
q20	1893	1975	1813	1813
q21	4695	4420	4196	4196
q22	1087	1035	993	993
Total cold run time: 53512 ms
Total hot run time: 51512 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182110 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a65427fa2d6beaca8002d6ddff78ade3a65ae701, data reload: false

query1	1038	414	394	394
query2	6603	1208	1155	1155
query3	6744	230	228	228
query4	25121	23487	22911	22911
query5	4976	650	503	503
query6	345	254	226	226
query7	4659	527	302	302
query8	326	254	261	254
query9	8763	2625	2666	2625
query10	545	341	310	310
query11	15395	15126	14908	14908
query12	172	118	111	111
query13	1679	579	432	432
query14	9584	6024	6064	6024
query15	228	201	181	181
query16	7583	687	534	534
query17	1189	768	625	625
query18	2061	445	356	356
query19	215	211	185	185
query20	132	130	124	124
query21	219	144	118	118
query22	3937	3952	3887	3887
query23	32705	31654	32039	31654
query24	8479	2464	2450	2450
query25	656	556	497	497
query26	1245	281	177	177
query27	2719	510	347	347
query28	4379	2177	2162	2162
query29	849	661	532	532
query30	328	248	216	216
query31	828	702	638	638
query32	92	78	78	78
query33	627	405	341	341
query34	847	884	556	556
query35	811	854	751	751
query36	897	939	866	866
query37	139	113	94	94
query38	3951	3883	3761	3761
query39	1466	1415	1427	1415
query40	234	141	127	127
query41	72	73	69	69
query42	122	119	120	119
query43	453	458	416	416
query44	1376	768	760	760
query45	204	196	187	187
query46	930	1034	665	665
query47	1649	1746	1675	1675
query48	412	465	331	331
query49	817	531	475	475
query50	717	748	417	417
query51	3926	3942	3827	3827
query52	121	120	106	106
query53	256	273	201	201
query54	342	317	292	292
query55	98	95	94	94
query56	351	347	343	343
query57	1172	1162	1107	1107
query58	347	273	274	273
query59	2273	2356	2370	2356
query60	360	352	352	352
query61	171	167	171	167
query62	772	713	666	666
query63	236	196	199	196
query64	4509	1203	901	901
query65	4064	3990	3932	3932
query66	1115	436	334	334
query67	15146	15344	14849	14849
query68	8200	1012	625	625
query69	517	346	311	311
query70	1088	1047	1015	1015
query71	485	343	324	324
query72	5929	4829	4876	4829
query73	679	589	348	348
query74	8929	8739	8672	8672
query75	3669	3062	2551	2551
query76	3757	1160	765	765
query77	804	411	321	321
query78	9425	9792	8884	8884
query79	2001	864	605	605
query80	638	609	515	515
query81	507	275	242	242
query82	373	143	125	125
query83	274	264	258	258
query84	255	116	96	96
query85	954	504	452	452
query86	340	302	287	287
query87	4085	4108	3947	3947
query88	3889	2291	2281	2281
query89	390	338	309	309
query90	1984	225	218	218
query91	195	175	142	142
query92	86	70	65	65
query93	1405	1047	667	667
query94	683	443	345	345
query95	501	419	421	419
query96	542	557	298	298
query97	2601	2684	2576	2576
query98	242	217	218	217
query99	1345	1434	1236	1236
Total cold run time: 271549 ms
Total hot run time: 182110 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a65427fa2d6beaca8002d6ddff78ade3a65ae701, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.04	0.04
query3	0.26	0.08	0.08
query4	1.60	0.12	0.11
query5	0.26	0.24	0.27
query6	1.16	0.64	0.63
query7	0.04	0.03	0.03
query8	0.06	0.04	0.05
query9	0.57	0.50	0.51
query10	0.56	0.57	0.56
query11	0.15	0.11	0.11
query12	0.15	0.11	0.12
query13	0.62	0.59	0.60
query14	0.99	0.96	1.00
query15	0.81	0.78	0.80
query16	0.40	0.39	0.40
query17	1.05	1.06	1.04
query18	0.23	0.22	0.22
query19	1.84	1.83	1.82
query20	0.01	0.02	0.01
query21	15.49	0.31	0.14
query22	4.81	0.05	0.05
query23	15.99	0.29	0.10
query24	1.30	0.44	0.33
query25	0.08	0.08	0.06
query26	0.14	0.14	0.13
query27	0.06	0.06	0.05
query28	3.68	1.23	1.02
query29	12.61	3.94	3.20
query30	0.28	0.14	0.12
query31	2.81	0.63	0.39
query32	3.24	0.55	0.46
query33	3.18	3.03	2.98
query34	16.75	5.16	4.61
query35	4.55	4.50	4.52
query36	0.66	0.50	0.48
query37	0.10	0.06	0.07
query38	0.08	0.04	0.04
query39	0.04	0.03	0.04
query40	0.19	0.14	0.13
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 97.12 s
Total hot run time: 27.28 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.51% (18735/35011)
Line Coverage 39.10% (172709/441711)
Region Coverage 33.69% (133657/396690)
Branch Coverage 34.65% (57546/166082)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.21% (24787/34327)
Line Coverage 58.88% (259861/441316)
Region Coverage 53.84% (216172/401500)
Branch Coverage 55.34% (92406/166979)

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35173 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 20d2e9afa1ea7985d0be2117cecc6b316b3bb980, data reload: false

------ Round 1 ----------------------------------
q1	17601	5063	4950	4950
q2	2038	360	211	211
q3	10231	1328	722	722
q4	10220	840	321	321
q5	7519	2469	2149	2149
q6	189	176	138	138
q7	962	792	636	636
q8	9344	1438	1101	1101
q9	6977	5338	5314	5314
q10	6793	2204	1825	1825
q11	527	317	287	287
q12	347	360	223	223
q13	17786	3674	3072	3072
q14	237	245	216	216
q15	589	518	515	515
q16	912	876	831	831
q17	690	779	549	549
q18	7368	7556	7965	7556
q19	1325	1032	617	617
q20	404	367	243	243
q21	4342	4152	2724	2724
q22	1060	1076	973	973
Total cold run time: 107461 ms
Total hot run time: 35173 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5170	5169	5164	5164
q2	335	409	359	359
q3	2374	2814	2551	2551
q4	1614	1804	1409	1409
q5	4506	4421	4514	4421
q6	204	174	124	124
q7	2001	1987	1862	1862
q8	2624	2544	2499	2499
q9	7558	7585	7552	7552
q10	3108	3339	2787	2787
q11	589	544	486	486
q12	691	769	628	628
q13	3497	3773	2974	2974
q14	273	281	255	255
q15	541	488	490	488
q16	873	891	853	853
q17	1108	1345	1379	1345
q18	7241	7235	6973	6973
q19	867	807	829	807
q20	1929	1962	1817	1817
q21	4635	4242	4169	4169
q22	1097	1018	974	974
Total cold run time: 52835 ms
Total hot run time: 50497 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182180 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 20d2e9afa1ea7985d0be2117cecc6b316b3bb980, data reload: false

query1	1073	412	407	407
query2	6587	1200	1161	1161
query3	6739	236	226	226
query4	25355	23365	22730	22730
query5	5857	651	508	508
query6	340	246	216	216
query7	4652	522	306	306
query8	319	260	245	245
query9	8752	2651	2642	2642
query10	559	364	309	309
query11	15304	15234	14882	14882
query12	179	116	114	114
query13	1680	557	436	436
query14	10863	5862	5918	5862
query15	258	202	189	189
query16	7666	674	527	527
query17	1577	808	629	629
query18	2072	463	335	335
query19	245	198	174	174
query20	130	131	120	120
query21	221	137	117	117
query22	3894	4059	3876	3876
query23	33061	31989	32121	31989
query24	8221	2416	2421	2416
query25	611	511	453	453
query26	1235	285	167	167
query27	2696	495	340	340
query28	4457	2169	2188	2169
query29	794	629	484	484
query30	321	245	212	212
query31	819	707	621	621
query32	83	78	71	71
query33	597	393	326	326
query34	830	888	560	560
query35	792	812	735	735
query36	903	936	839	839
query37	127	115	92	92
query38	3872	3879	3843	3843
query39	1488	1411	1409	1409
query40	235	137	126	126
query41	66	64	66	64
query42	130	125	110	110
query43	444	437	416	416
query44	1327	770	749	749
query45	204	194	182	182
query46	910	1022	643	643
query47	1689	1750	1641	1641
query48	408	436	334	334
query49	769	491	423	423
query50	697	725	407	407
query51	3997	3853	3831	3831
query52	122	121	112	112
query53	248	263	214	214
query54	343	328	302	302
query55	100	98	90	90
query56	359	367	356	356
query57	1137	1156	1097	1097
query58	308	285	290	285
query59	2367	2483	2368	2368
query60	389	365	351	351
query61	184	191	183	183
query62	797	719	668	668
query63	235	199	205	199
query64	4618	1307	1020	1020
query65	4057	3973	4015	3973
query66	1078	465	362	362
query67	15315	14950	14908	14908
query68	8409	1007	641	641
query69	533	356	325	325
query70	1147	1018	993	993
query71	480	354	345	345
query72	5993	4829	4799	4799
query73	684	596	351	351
query74	8598	8862	8721	8721
query75	3091	3056	2520	2520
query76	3283	1154	759	759
query77	522	414	315	315
query78	9443	9679	8937	8937
query79	2279	825	586	586
query80	670	588	494	494
query81	520	280	245	245
query82	466	140	115	115
query83	276	268	252	252
query84	259	131	101	101
query85	881	502	450	450
query86	334	295	276	276
query87	4021	4116	4017	4017
query88	4138	2305	2339	2305
query89	390	346	293	293
query90	2061	240	216	216
query91	175	169	140	140
query92	89	66	67	66
query93	2093	1041	671	671
query94	754	468	343	343
query95	507	428	421	421
query96	568	572	285	285
query97	2656	2676	2552	2552
query98	263	234	230	230
query99	1315	1392	1265	1265
Total cold run time: 274660 ms
Total hot run time: 182180 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 20d2e9afa1ea7985d0be2117cecc6b316b3bb980, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.08	0.08
query4	1.61	0.12	0.11
query5	0.26	0.25	0.25
query6	1.17	0.65	0.64
query7	0.03	0.03	0.02
query8	0.06	0.04	0.05
query9	0.58	0.52	0.51
query10	0.56	0.55	0.55
query11	0.17	0.11	0.11
query12	0.15	0.12	0.11
query13	0.63	0.62	0.61
query14	1.00	1.00	0.99
query15	0.82	0.79	0.81
query16	0.41	0.41	0.39
query17	0.99	1.02	1.03
query18	0.25	0.22	0.22
query19	1.93	1.82	1.79
query20	0.01	0.01	0.01
query21	15.44	0.29	0.14
query22	4.86	0.05	0.04
query23	16.08	0.29	0.10
query24	2.24	0.78	0.41
query25	0.13	0.06	0.04
query26	0.14	0.13	0.14
query27	0.05	0.06	0.07
query28	5.98	1.21	1.02
query29	12.60	4.04	3.20
query30	0.27	0.13	0.12
query31	2.83	0.64	0.40
query32	3.22	0.56	0.48
query33	2.98	3.08	3.15
query34	16.92	5.16	4.57
query35	4.57	4.47	4.54
query36	0.66	0.51	0.49
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.18	0.15	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 100.61 s
Total hot run time: 27.46 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.51% (18735/35011)
Line Coverage 39.10% (172717/441711)
Region Coverage 33.71% (133712/396690)
Branch Coverage 34.66% (57559/166082)

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 3, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

PR approved by anyone and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.25% (24801/34327)
Line Coverage 58.89% (259871/441316)
Region Coverage 53.66% (215439/401500)
Branch Coverage 55.29% (92318/166979)

@airborne12 airborne12 merged commit d50a794 into apache:master Dec 4, 2025
30 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 4, 2025
### What problem does this PR solve?
Previous pr: #57623

The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.

For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>

We need to make batch size configurable so that we can modify them when
we need to do it.

For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>

**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**
@zhiqiang-hhhh zhiqiang-hhhh deleted the opt-index-build branch December 4, 2025 13:06
yiguolei pushed a commit that referenced this pull request Dec 5, 2025
…ble #58645 (#58727)

Cherry-picked from #58645

Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
nagisa-kunhah pushed a commit to nagisa-kunhah/doris that referenced this pull request Dec 14, 2025
…58645)

### What problem does this PR solve?
Previous pr: apache#57623

The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.

For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>

We need to make batch size configurable so that we can modify them when
we need to do it.

For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>

**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants