Skip to content

[improve](ann index)Accumulate multiple small batches before training#57623

Merged
airborne12 merged 5 commits intoapache:masterfrom
uchenily:ann-optimize
Nov 11, 2025
Merged

[improve](ann index)Accumulate multiple small batches before training#57623
airborne12 merged 5 commits intoapache:masterfrom
uchenily:ann-optimize

Conversation

@uchenily
Copy link
Contributor

@uchenily uchenily commented Nov 3, 2025

What problem does this PR solve?

Accumulate multiple small batches to avoid the following error when training:
Error: 'nx >= k' failed: Number of training points should be at least as large as number of clusters,
and significantly reduce the time for faiss train/add.

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@uchenily
Copy link
Contributor Author

uchenily commented Nov 3, 2025

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 27.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 88c3ecacaf859194815967aeb7ea0c2d36b7a9e8, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.05	0.06
query3	0.25	0.08	0.08
query4	1.61	0.12	0.12
query5	0.29	0.28	0.25
query6	1.16	0.64	0.66
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.62	0.53	0.52
query10	0.58	0.57	0.58
query11	0.16	0.11	0.12
query12	0.16	0.12	0.13
query13	0.62	0.60	0.60
query14	1.03	1.00	1.02
query15	0.87	0.83	0.84
query16	0.40	0.40	0.40
query17	1.03	1.04	1.05
query18	0.21	0.23	0.20
query19	1.95	1.85	1.80
query20	0.01	0.01	0.02
query21	15.44	0.19	0.13
query22	5.14	0.07	0.05
query23	15.69	0.26	0.10
query24	3.15	0.59	0.54
query25	0.07	0.07	0.07
query26	0.16	0.15	0.13
query27	0.07	0.06	0.06
query28	4.91	1.14	0.94
query29	12.62	3.93	3.27
query30	0.28	0.14	0.12
query31	2.81	0.59	0.38
query32	3.23	0.56	0.47
query33	3.06	3.04	3.04
query34	15.97	5.17	4.57
query35	4.59	4.58	4.58
query36	0.69	0.52	0.50
query37	0.10	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.02	0.03
query40	0.19	0.15	0.15
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 99.62 s
Total hot run time: 27.76 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 69.57% (16/23) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.70% (18048/34249)
Line Coverage 37.99% (164064/431862)
Region Coverage 32.30% (124964/386918)
Branch Coverage 33.74% (54717/162194)

@uchenily
Copy link
Contributor Author

uchenily commented Nov 3, 2025

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 29.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 295be19d0cb6a0f46254cd373bc338477bf9a539, data reload: false

query1	0.06	0.06	0.05
query2	0.10	0.05	0.06
query3	0.26	0.09	0.09
query4	1.62	0.12	0.12
query5	0.28	0.28	0.27
query6	1.21	0.68	0.67
query7	0.04	0.02	0.03
query8	0.07	0.05	0.05
query9	0.66	0.57	0.58
query10	0.62	0.62	0.62
query11	0.19	0.13	0.14
query12	0.19	0.14	0.14
query13	0.64	0.62	0.61
query14	1.03	1.04	1.02
query15	0.89	0.87	0.91
query16	0.45	0.42	0.42
query17	1.14	1.12	1.24
query18	0.24	0.22	0.21
query19	2.01	1.91	1.99
query20	0.02	0.02	0.02
query21	15.38	0.22	0.16
query22	5.00	0.08	0.06
query23	15.60	0.31	0.12
query24	2.90	0.59	0.47
query25	0.10	0.08	0.07
query26	0.17	0.16	0.16
query27	0.08	0.06	0.06
query28	4.50	1.16	0.96
query29	12.62	4.55	3.79
query30	0.34	0.16	0.12
query31	2.84	0.65	0.42
query32	3.25	0.60	0.50
query33	3.08	3.09	3.30
query34	16.06	5.22	4.65
query35	4.64	4.62	4.56
query36	0.69	0.55	0.52
query37	0.12	0.08	0.08
query38	0.07	0.05	0.05
query39	0.05	0.03	0.03
query40	0.19	0.16	0.16
query41	0.10	0.04	0.03
query42	0.05	0.04	0.03
query43	0.05	0.04	0.04
Total cold run time: 99.6 s
Total hot run time: 29.09 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 69.57% (16/23) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.70% (18048/34249)
Line Coverage 37.99% (164071/431862)
Region Coverage 32.32% (125044/386918)
Branch Coverage 33.74% (54718/162194)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 69.57% (16/23) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.44% (24052/33668)
Line Coverage 57.81% (250092/432607)
Region Coverage 52.71% (206883/392514)
Branch Coverage 54.51% (89022/163303)

// VectorIndex should be weak shared by AnnIndexWriter and VectorIndexReader
// This should be a weak_ptr
std::shared_ptr<VectorIndex> _vector_index;
std::vector<float> _ann_vec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace std::vector with DorisVector for memory safe.


if (i > 0) {
vectorized::Int64 offset = i * dim;
std::copy(_ann_vec.begin() + offset, _ann_vec.end(), _ann_vec.begin());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cost of memory copy can be optimized by using std::list<std::shared_ptr<DorisVector>>

@uchenily
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34427 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bc19e89dbabb58e55e36f4453919414a7084ef94, data reload: false

------ Round 1 ----------------------------------
q1	17644	5232	5077	5077
q2	2026	321	206	206
q3	10271	1317	712	712
q4	10222	886	359	359
q5	7510	2444	2380	2380
q6	189	170	135	135
q7	967	776	620	620
q8	9353	1391	1172	1172
q9	6971	5389	5169	5169
q10	6898	2233	1828	1828
q11	520	304	281	281
q12	331	365	228	228
q13	17806	3663	3077	3077
q14	231	237	215	215
q15	573	518	502	502
q16	1031	1011	938	938
q17	593	851	391	391
q18	7556	7131	7021	7021
q19	1307	959	567	567
q20	351	340	228	228
q21	3711	3215	2337	2337
q22	1065	1022	984	984
Total cold run time: 107126 ms
Total hot run time: 34427 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5169	5103	5121	5103
q2	247	332	226	226
q3	2174	2753	2286	2286
q4	1337	1742	1309	1309
q5	4208	4658	4419	4419
q6	214	172	128	128
q7	2043	1984	1870	1870
q8	2696	2533	2514	2514
q9	7413	7233	7392	7233
q10	3092	3346	2827	2827
q11	612	534	503	503
q12	701	795	654	654
q13	3810	3907	3298	3298
q14	306	303	266	266
q15	542	517	517	517
q16	1049	1126	1101	1101
q17	1196	1540	1477	1477
q18	8121	7669	7689	7669
q19	806	889	1103	889
q20	2092	2032	1885	1885
q21	4787	4277	4304	4277
q22	1086	1044	987	987
Total cold run time: 53701 ms
Total hot run time: 51438 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187644 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bc19e89dbabb58e55e36f4453919414a7084ef94, data reload: false

query1	1052	407	397	397
query2	6579	1674	1701	1674
query3	6758	232	233	232
query4	26106	23909	23113	23113
query5	4422	623	480	480
query6	322	255	212	212
query7	4645	494	287	287
query8	300	259	261	259
query9	8708	2567	2571	2567
query10	507	348	296	296
query11	15611	15064	14911	14911
query12	174	117	117	117
query13	1676	576	441	441
query14	10518	9100	9174	9100
query15	195	192	172	172
query16	7380	701	517	517
query17	1257	787	639	639
query18	2006	433	337	337
query19	208	211	190	190
query20	137	133	120	120
query21	212	138	117	117
query22	4105	4059	4026	4026
query23	34290	32807	32938	32807
query24	8420	2420	2399	2399
query25	638	564	493	493
query26	1240	282	160	160
query27	2726	501	349	349
query28	4374	2198	2187	2187
query29	850	642	508	508
query30	308	232	195	195
query31	902	823	737	737
query32	83	75	73	73
query33	601	381	349	349
query34	791	848	512	512
query35	789	839	755	755
query36	1006	966	910	910
query37	122	113	88	88
query38	3486	3567	3506	3506
query39	1472	1444	1428	1428
query40	220	130	123	123
query41	61	58	63	58
query42	126	113	111	111
query43	478	501	479	479
query44	1214	749	730	730
query45	188	178	171	171
query46	865	973	632	632
query47	1786	1822	1736	1736
query48	391	419	313	313
query49	773	505	397	397
query50	638	687	399	399
query51	3975	3920	4032	3920
query52	117	107	104	104
query53	234	262	193	193
query54	318	290	271	271
query55	83	89	82	82
query56	323	338	300	300
query57	1167	1182	1113	1113
query58	287	268	269	268
query59	2604	2615	2472	2472
query60	347	343	323	323
query61	196	160	157	157
query62	784	759	680	680
query63	229	192	194	192
query64	4538	1148	853	853
query65	4008	3978	3917	3917
query66	1183	439	356	356
query67	15297	15239	14944	14944
query68	8479	927	595	595
query69	492	321	294	294
query70	1323	1281	1281	1281
query71	489	341	315	315
query72	5981	5032	4855	4855
query73	693	585	370	370
query74	8991	9262	8944	8944
query75	3933	3313	2857	2857
query76	3736	1178	731	731
query77	799	387	335	335
query78	9465	9564	8933	8933
query79	3167	849	594	594
query80	693	579	500	500
query81	495	262	227	227
query82	451	160	128	128
query83	309	278	263	263
query84	296	120	92	92
query85	920	481	461	461
query86	339	315	284	284
query87	3810	3713	3693	3693
query88	3295	2289	2226	2226
query89	425	342	297	297
query90	1977	221	217	217
query91	180	173	140	140
query92	85	68	62	62
query93	2325	977	641	641
query94	725	414	351	351
query95	414	318	319	318
query96	496	571	281	281
query97	2956	3003	2889	2889
query98	249	225	210	210
query99	1445	1419	1306	1306
Total cold run time: 276839 ms
Total hot run time: 187644 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bc19e89dbabb58e55e36f4453919414a7084ef94, data reload: false

query1	0.06	0.04	0.04
query2	0.09	0.05	0.06
query3	0.26	0.09	0.08
query4	1.61	0.12	0.12
query5	0.28	0.24	0.26
query6	1.20	0.64	0.65
query7	0.04	0.03	0.03
query8	0.05	0.04	0.05
query9	0.59	0.54	0.52
query10	0.57	0.58	0.57
query11	0.16	0.11	0.12
query12	0.16	0.12	0.12
query13	0.62	0.60	0.60
query14	1.01	1.01	1.01
query15	0.85	0.83	0.84
query16	0.37	0.38	0.40
query17	1.07	1.01	1.03
query18	0.22	0.20	0.21
query19	1.87	1.85	1.79
query20	0.02	0.02	0.02
query21	15.44	0.20	0.13
query22	5.10	0.07	0.05
query23	15.66	0.25	0.10
query24	2.39	0.69	1.01
query25	0.08	0.07	0.06
query26	0.15	0.14	0.14
query27	0.07	0.06	0.05
query28	4.59	1.16	0.93
query29	12.59	3.86	3.20
query30	0.28	0.15	0.11
query31	2.82	0.58	0.39
query32	3.23	0.55	0.46
query33	2.98	3.10	3.07
query34	15.89	5.19	4.53
query35	4.59	4.58	4.59
query36	0.67	0.51	0.49
query37	0.10	0.07	0.06
query38	0.08	0.04	0.04
query39	0.04	0.03	0.03
query40	0.16	0.14	0.14
query41	0.09	0.04	0.03
query42	0.03	0.03	0.02
query43	0.05	0.03	0.04
Total cold run time: 98.18 s
Total hot run time: 27.73 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (31/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.78% (18226/34533)
Line Coverage 38.13% (165780/434744)
Region Coverage 33.12% (128911/389206)
Branch Coverage 33.87% (55321/163336)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (31/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.56% (24294/33947)
Line Coverage 58.06% (252845/435464)
Region Coverage 53.36% (210645/394727)
Branch Coverage 54.67% (89895/164419)


size_t block_size = CHUNK_SIZE * build_parameter.dim;
// The array capacity will not change after resizing
_float_array.resize(block_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve instead of resize

size_t block_size = CHUNK_SIZE * build_parameter.dim;
// The array capacity will not change after resizing
_float_array.resize(block_size);
_array_offset = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_array_offset is not needed

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

airborne12
airborne12 previously approved these changes Nov 10, 2025
Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 10, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@uchenily
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 10, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 36144 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b7ffe325460f56d8806600839d8417c42e126e3d, data reload: false

------ Round 1 ----------------------------------
q1	17623	5218	5069	5069
q2	2039	332	200	200
q3	10280	1308	745	745
q4	10282	934	372	372
q5	8244	2404	2424	2404
q6	205	167	135	135
q7	926	768	622	622
q8	9349	1411	1150	1150
q9	7445	5214	5183	5183
q10	6926	2233	1789	1789
q11	503	292	281	281
q12	365	368	224	224
q13	17789	3659	3038	3038
q14	226	244	209	209
q15	587	499	519	499
q16	1024	1002	946	946
q17	591	897	373	373
q18	7450	7778	7745	7745
q19	1616	985	600	600
q20	354	367	237	237
q21	4356	3761	3271	3271
q22	1097	1105	1052	1052
Total cold run time: 109277 ms
Total hot run time: 36144 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5511	5322	5417	5322
q2	262	336	223	223
q3	2470	2928	2513	2513
q4	1467	1886	1504	1504
q5	4718	4542	4266	4266
q6	221	166	126	126
q7	2005	1956	1843	1843
q8	2606	2562	2568	2562
q9	7342	7601	7259	7259
q10	2953	3131	2682	2682
q11	569	511	494	494
q12	637	737	589	589
q13	3306	3623	3021	3021
q14	280	280	261	261
q15	544	490	484	484
q16	1011	1049	1003	1003
q17	1126	1450	1330	1330
q18	7304	7278	6967	6967
q19	777	731	759	731
q20	1949	1992	1808	1808
q21	4717	4359	4294	4294
q22	1099	1069	986	986
Total cold run time: 52874 ms
Total hot run time: 50268 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187633 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b7ffe325460f56d8806600839d8417c42e126e3d, data reload: false

query1	1036	411	393	393
query2	6560	1697	1718	1697
query3	6757	227	220	220
query4	26185	23770	23225	23225
query5	4445	650	469	469
query6	341	242	232	232
query7	4652	508	297	297
query8	312	267	252	252
query9	8720	2584	2557	2557
query10	484	364	289	289
query11	15720	15035	14848	14848
query12	185	118	112	112
query13	1678	557	441	441
query14	11250	9062	9163	9062
query15	200	185	175	175
query16	7675	667	510	510
query17	1248	743	608	608
query18	2024	413	347	347
query19	206	205	187	187
query20	134	122	119	119
query21	214	128	116	116
query22	3970	4052	4045	4045
query23	34149	32828	33099	32828
query24	8408	2413	2418	2413
query25	610	514	437	437
query26	1240	269	154	154
query27	2760	490	346	346
query28	4394	2243	2178	2178
query29	801	602	476	476
query30	294	223	200	200
query31	935	796	713	713
query32	87	73	76	73
query33	594	367	335	335
query34	795	861	530	530
query35	805	827	718	718
query36	961	990	907	907
query37	126	109	90	90
query38	3555	3579	3439	3439
query39	1472	1411	1474	1411
query40	230	130	120	120
query41	78	61	64	61
query42	127	112	113	112
query43	491	491	445	445
query44	1245	758	745	745
query45	187	185	182	182
query46	882	973	635	635
query47	1804	1826	1711	1711
query48	396	434	313	313
query49	771	536	436	436
query50	636	684	401	401
query51	3939	4115	3923	3923
query52	110	107	105	105
query53	240	267	203	203
query54	318	303	278	278
query55	91	88	83	83
query56	344	314	306	306
query57	1182	1194	1140	1140
query58	287	275	285	275
query59	2517	2797	2583	2583
query60	375	367	336	336
query61	194	212	194	194
query62	799	722	665	665
query63	224	197	192	192
query64	4588	1282	968	968
query65	4043	3939	4002	3939
query66	1121	486	359	359
query67	15485	15190	14788	14788
query68	8342	931	598	598
query69	491	328	291	291
query70	1293	1291	1339	1291
query71	468	342	308	308
query72	6067	4926	4864	4864
query73	621	577	361	361
query74	9411	9043	8972	8972
query75	3545	3334	2859	2859
query76	3391	1163	722	722
query77	673	391	339	339
query78	9706	9646	8910	8910
query79	2792	851	594	594
query80	737	585	510	510
query81	528	261	229	229
query82	677	155	128	128
query83	269	263	260	260
query84	257	109	88	88
query85	955	479	452	452
query86	393	324	304	304
query87	3692	3796	3645	3645
query88	4088	2253	2289	2253
query89	391	328	302	302
query90	2060	224	219	219
query91	171	168	135	135
query92	83	70	67	67
query93	2556	1004	632	632
query94	706	407	348	348
query95	408	318	319	318
query96	500	573	285	285
query97	2932	2985	2904	2904
query98	239	212	210	210
query99	1317	1433	1288	1288
Total cold run time: 278445 ms
Total hot run time: 187633 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b7ffe325460f56d8806600839d8417c42e126e3d, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.12
query5	0.26	0.26	0.24
query6	1.18	0.64	0.63
query7	0.04	0.03	0.03
query8	0.05	0.04	0.04
query9	0.59	0.52	0.51
query10	0.57	0.57	0.57
query11	0.17	0.12	0.12
query12	0.16	0.11	0.12
query13	0.63	0.61	0.60
query14	1.02	1.00	0.99
query15	0.85	0.84	0.84
query16	0.38	0.40	0.42
query17	1.01	1.03	1.01
query18	0.22	0.19	0.20
query19	1.90	1.84	1.76
query20	0.02	0.01	0.02
query21	15.43	0.21	0.14
query22	4.98	0.08	0.05
query23	15.67	0.27	0.10
query24	3.56	0.72	1.19
query25	0.10	0.06	0.06
query26	0.15	0.13	0.13
query27	0.07	0.05	0.05
query28	5.33	1.12	0.93
query29	12.53	4.02	3.38
query30	0.28	0.14	0.12
query31	2.82	0.59	0.38
query32	3.23	0.54	0.48
query33	3.13	3.06	3.03
query34	15.80	5.20	4.54
query35	4.57	4.53	4.62
query36	0.69	0.50	0.49
query37	0.10	0.07	0.07
query38	0.06	0.05	0.03
query39	0.04	0.03	0.03
query40	0.16	0.14	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 99.91 s
Total hot run time: 27.84 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (29/29) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.78% (18226/34533)
Line Coverage 38.13% (165789/434751)
Region Coverage 33.17% (129099/389211)
Branch Coverage 33.88% (55334/163340)

@uchenily
Copy link
Contributor Author

run cloud_p0

@uchenily
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (29/29) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.53% (24282/33947)
Line Coverage 58.02% (252646/435471)
Region Coverage 53.47% (211070/394732)
Branch Coverage 54.67% (89889/164423)

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 11, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@airborne12 airborne12 merged commit 20302fe into apache:master Nov 11, 2025
26 of 27 checks passed
zhiqiang-hhhh pushed a commit to zhiqiang-hhhh/doris that referenced this pull request Nov 12, 2025
…apache#57623)

Accumulate multiple small batches to avoid the following error when
training:
`Error: 'nx >= k' failed: Number of training points should be at least
as large as number of clusters`,
and significantly reduce the time for faiss train/add.
zhiqiang-hhhh pushed a commit to zhiqiang-hhhh/doris that referenced this pull request Nov 12, 2025
…apache#57623)

Accumulate multiple small batches to avoid the following error when
training:
`Error: 'nx >= k' failed: Number of training points should be at least
as large as number of clusters`,
and significantly reduce the time for faiss train/add.
yiguolei pushed a commit that referenced this pull request Nov 12, 2025
…ore training #57623 (#57932)

cherry pick from #57623

Co-authored-by: ivin <uchenily@qq.com>
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Nov 13, 2025
…apache#57623)

### What problem does this PR solve?

Accumulate multiple small batches to avoid the following error when
training:
`Error: 'nx >= k' failed: Number of training points should be at least
as large as number of clusters`,
and significantly reduce the time for faiss train/add.
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Nov 18, 2025
…apache#57623)

### What problem does this PR solve?

Accumulate multiple small batches to avoid the following error when
training:
`Error: 'nx >= k' failed: Number of training points should be at least
as large as number of clusters`,
and significantly reduce the time for faiss train/add.
@yiguolei yiguolei mentioned this pull request Dec 2, 2025
airborne12 pushed a commit that referenced this pull request Dec 4, 2025
### What problem does this PR solve?
Previous pr: #57623

The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.

For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>

We need to make batch size configurable so that we can modify them when
we need to do it.

For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>

**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**
github-actions bot pushed a commit that referenced this pull request Dec 4, 2025
### What problem does this PR solve?
Previous pr: #57623

The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.

For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>

We need to make batch size configurable so that we can modify them when
we need to do it.

For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>

**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**
nagisa-kunhah pushed a commit to nagisa-kunhah/doris that referenced this pull request Dec 14, 2025
…58645)

### What problem does this PR solve?
Previous pr: apache#57623

The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.

For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>

We need to make batch size configurable so that we can modify them when
we need to do it.

For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>

**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants