Skip to content

branch-3.0: [opt](inverted index) create non analyzer when parser is none for inverted index #54666#54795

Merged
dataroaring merged 2 commits intoapache:branch-3.0from
airborne12:pick_54666_to_origin_branch-3.0
Aug 18, 2025
Merged

branch-3.0: [opt](inverted index) create non analyzer when parser is none for inverted index #54666#54795
dataroaring merged 2 commits intoapache:branch-3.0from
airborne12:pick_54666_to_origin_branch-3.0

Conversation

@airborne12
Copy link
Member

cherry pick from #54666

…erted index (apache#54666)

Issue Number: close #xxx

Related PR: apache#54619

Problem Summary:
When no parser is specified, the inverted index writer currently creates
a default analyzer (simple analyzer), which can cause unnecessary
performance overhead. This PR addresses this by setting the analyzer to
nullptr to avoid the overhead.
Note: This PR should be merged together with or after apache#54619.
@airborne12
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40066 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be08b17e989772fe7b68611d1f7dd7ed763036b6, data reload: false

------ Round 1 ----------------------------------
q1	17619	6751	6659	6659
q2	2043	199	189	189
q3	10546	1134	1148	1134
q4	10432	762	747	747
q5	7757	3135	2867	2867
q6	210	137	133	133
q7	983	617	605	605
q8	9363	1961	2027	1961
q9	6705	6411	6494	6411
q10	6987	2303	2326	2303
q11	457	271	265	265
q12	397	211	205	205
q13	17774	2995	2974	2974
q14	244	215	213	213
q15	517	463	465	463
q16	452	380	382	380
q17	978	611	542	542
q18	7315	6765	6740	6740
q19	1388	1091	999	999
q20	483	197	200	197
q21	3869	3166	3071	3071
q22	1098	1008	1018	1008
Total cold run time: 107617 ms
Total hot run time: 40066 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6993	6562	6558	6558
q2	338	224	234	224
q3	2934	2922	2906	2906
q4	2030	1824	1831	1824
q5	5742	5725	5706	5706
q6	207	125	124	124
q7	2247	1766	1812	1766
q8	3402	3531	3569	3531
q9	8772	8906	8843	8843
q10	3563	3542	3509	3509
q11	600	488	486	486
q12	835	609	654	609
q13	10484	3207	3138	3138
q14	306	272	272	272
q15	524	467	475	467
q16	486	446	451	446
q17	1857	1624	1615	1615
q18	8227	7712	7691	7691
q19	1712	1723	1583	1583
q20	2055	1831	1825	1825
q21	5239	5048	4952	4952
q22	1165	1086	1019	1019
Total cold run time: 69718 ms
Total hot run time: 59094 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192064 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be08b17e989772fe7b68611d1f7dd7ed763036b6, data reload: false

query1	927	414	406	406
query2	6265	1985	1973	1973
query3	8694	193	194	193
query4	33916	23485	23383	23383
query5	3644	448	442	442
query6	278	172	173	172
query7	4228	315	319	315
query8	283	209	221	209
query9	9525	2573	2547	2547
query10	459	260	261	260
query11	17948	15142	15383	15142
query12	168	101	106	101
query13	1572	442	412	412
query14	8786	7719	6906	6906
query15	290	177	184	177
query16	7996	496	517	496
query17	1624	599	600	599
query18	2161	317	330	317
query19	393	160	166	160
query20	121	110	116	110
query21	201	109	107	107
query22	4777	4662	4641	4641
query23	34899	34410	34047	34047
query24	11646	2876	2806	2806
query25	702	431	434	431
query26	1789	172	171	171
query27	2805	349	364	349
query28	7824	2167	2183	2167
query29	1039	462	456	456
query30	269	160	164	160
query31	1050	785	820	785
query32	106	57	63	57
query33	778	338	324	324
query34	941	513	525	513
query35	957	752	737	737
query36	1119	977	951	951
query37	181	67	70	67
query38	4063	3938	3942	3938
query39	1594	1471	1483	1471
query40	258	96	97	96
query41	49	51	46	46
query42	119	99	99	99
query43	506	490	488	488
query44	1270	819	803	803
query45	194	180	181	180
query46	1186	734	720	720
query47	2008	1910	1943	1910
query48	478	380	398	380
query49	1105	395	407	395
query50	858	416	433	416
query51	7559	7136	7176	7136
query52	98	92	88	88
query53	258	182	187	182
query54	1410	470	468	468
query55	82	76	75	75
query56	271	246	251	246
query57	1379	1185	1216	1185
query58	224	212	213	212
query59	3278	3282	3204	3204
query60	299	260	263	260
query61	114	109	112	109
query62	883	683	708	683
query63	219	195	191	191
query64	5157	682	647	647
query65	3396	3333	3315	3315
query66	1443	300	323	300
query67	16296	15709	15525	15525
query68	5656	561	553	553
query69	442	255	266	255
query70	1224	1082	1077	1077
query71	339	248	257	248
query72	6234	4079	4027	4027
query73	747	352	370	352
query74	10644	8930	9195	8930
query75	3372	2595	2667	2595
query76	3341	1080	1081	1080
query77	412	268	266	266
query78	10608	9686	9529	9529
query79	1649	597	579	579
query80	1079	428	422	422
query81	543	218	217	217
query82	605	87	88	87
query83	233	149	157	149
query84	234	82	82	82
query85	1329	320	294	294
query86	395	294	273	273
query87	4361	4198	4233	4198
query88	3419	2379	2341	2341
query89	416	285	289	285
query90	1996	181	183	181
query91	190	152	145	145
query92	57	52	51	51
query93	1759	539	544	539
query94	929	303	290	290
query95	353	254	253	253
query96	606	281	290	281
query97	3292	3152	3148	3148
query98	207	208	193	193
query99	1492	1350	1287	1287
Total cold run time: 304578 ms
Total hot run time: 192064 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be08b17e989772fe7b68611d1f7dd7ed763036b6, data reload: false

query1	0.03	0.02	0.03
query2	0.06	0.04	0.03
query3	0.24	0.07	0.07
query4	1.64	0.10	0.10
query5	0.53	0.52	0.49
query6	1.13	0.72	0.73
query7	0.03	0.02	0.01
query8	0.04	0.03	0.04
query9	0.58	0.50	0.49
query10	0.56	0.55	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.11
query13	0.61	0.60	0.60
query14	0.78	0.79	0.79
query15	0.82	0.82	0.82
query16	0.39	0.42	0.38
query17	1.05	1.05	1.05
query18	0.24	0.22	0.21
query19	1.96	1.77	1.79
query20	0.02	0.00	0.02
query21	15.40	0.58	0.59
query22	2.24	2.40	1.48
query23	16.97	0.84	0.82
query24	3.53	1.69	0.65
query25	0.21	0.12	0.10
query26	0.56	0.14	0.13
query27	0.06	0.04	0.03
query28	9.88	0.48	0.45
query29	12.60	3.23	3.28
query30	0.25	0.07	0.06
query31	2.85	0.39	0.38
query32	3.25	0.46	0.45
query33	2.99	3.01	3.00
query34	16.83	4.55	4.56
query35	4.58	4.51	4.57
query36	0.67	0.47	0.47
query37	0.08	0.07	0.05
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.14	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 104.31 s
Total hot run time: 29.51 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 42.06% (11231/26705)
Line Coverage 32.57% (96106/295054)
Region Coverage 30.50% (55160/180843)
Branch Coverage 26.82% (27285/101748)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.95% (19671/26246)
Line Coverage 68.20% (200685/294270)
Region Coverage 66.32% (120111/181111)
Branch Coverage 59.64% (60915/102138)

@airborne12 airborne12 changed the title [opt](inverted index) create non analyzer when parser is none for inverted index #54666 branch-3.0: [opt](inverted index) create non analyzer when parser is none for inverted index #54666 Aug 15, 2025
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.95% (19671/26246)
Line Coverage 68.20% (200685/294270)
Region Coverage 66.32% (120111/181111)
Branch Coverage 59.64% (60915/102138)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit b4332c7 into apache:branch-3.0 Aug 18, 2025
23 of 26 checks passed
@gavinchou gavinchou mentioned this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments