Skip to content

[opt](inverted index) create non analyzer when parser is none for inverted index#54666

Merged
airborne12 merged 1 commit intoapache:masterfrom
airborne12:fix-case
Aug 14, 2025
Merged

[opt](inverted index) create non analyzer when parser is none for inverted index#54666
airborne12 merged 1 commit intoapache:masterfrom
airborne12:fix-case

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Aug 13, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #54619

Problem Summary:
When no parser is specified, the inverted index writer currently creates a default analyzer (simple analyzer), which can cause unnecessary performance overhead. This PR addresses this by setting the analyzer to nullptr to avoid the overhead.
Note: This PR should be merged together with or after #54619.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Aug 13, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33861 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 53a661628f41f99a65c372574da3e74d471caccb, data reload: false

------ Round 1 ----------------------------------
q1	17619	5283	5084	5084
q2	1914	290	181	181
q3	10319	1274	701	701
q4	10218	1002	527	527
q5	7538	2361	2335	2335
q6	177	156	135	135
q7	910	770	591	591
q8	9293	1283	1109	1109
q9	6913	5060	5006	5006
q10	6914	2371	1986	1986
q11	483	291	262	262
q12	347	348	216	216
q13	17796	3589	2986	2986
q14	223	249	212	212
q15	569	464	481	464
q16	425	430	366	366
q17	592	839	364	364
q18	7204	7175	7136	7136
q19	1101	946	571	571
q20	344	332	221	221
q21	4131	3243	2441	2441
q22	1088	1025	967	967
Total cold run time: 106118 ms
Total hot run time: 33861 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5154	5103	5126	5103
q2	243	320	219	219
q3	2156	2698	2291	2291
q4	1427	1861	1304	1304
q5	4215	4404	4493	4404
q6	217	173	129	129
q7	2020	1983	1736	1736
q8	2642	2706	2570	2570
q9	7184	7138	7264	7138
q10	3085	3348	2904	2904
q11	595	518	503	503
q12	704	811	690	690
q13	3546	3904	3351	3351
q14	286	305	301	301
q15	520	479	478	478
q16	454	512	483	483
q17	1177	1581	1429	1429
q18	7702	7602	7607	7602
q19	820	859	874	859
q20	1990	2180	1783	1783
q21	4745	4321	4323	4321
q22	1068	1013	980	980
Total cold run time: 51950 ms
Total hot run time: 50578 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184829 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 53a661628f41f99a65c372574da3e74d471caccb, data reload: false

query1	992	378	429	378
query2	6544	1720	1766	1720
query3	6749	222	217	217
query4	26549	23523	22932	22932
query5	4330	637	478	478
query6	313	218	199	199
query7	4622	505	289	289
query8	283	244	225	225
query9	8587	2855	2848	2848
query10	469	343	297	297
query11	15872	15178	14865	14865
query12	170	116	115	115
query13	1653	563	418	418
query14	8688	5837	6017	5837
query15	217	202	163	163
query16	7389	670	467	467
query17	1220	744	625	625
query18	1997	427	337	337
query19	201	195	169	169
query20	139	134	123	123
query21	220	130	109	109
query22	4268	4361	4072	4072
query23	34087	33166	33389	33166
query24	8239	2336	2343	2336
query25	540	453	414	414
query26	1229	266	154	154
query27	2741	506	373	373
query28	4314	2216	2203	2203
query29	775	571	473	473
query30	299	225	196	196
query31	886	806	710	710
query32	79	75	72	72
query33	554	377	352	352
query34	799	858	510	510
query35	813	840	745	745
query36	979	1017	921	921
query37	121	108	90	90
query38	4102	4024	3959	3959
query39	1455	1415	1391	1391
query40	214	127	119	119
query41	62	56	54	54
query42	121	109	115	109
query43	499	501	458	458
query44	1375	843	849	843
query45	176	170	161	161
query46	867	1014	675	675
query47	1745	1789	1716	1716
query48	402	422	316	316
query49	738	494	418	418
query50	692	689	400	400
query51	4073	4164	4104	4104
query52	114	117	105	105
query53	244	257	191	191
query54	590	578	524	524
query55	92	90	93	90
query56	317	308	297	297
query57	1209	1194	1138	1138
query58	271	257	260	257
query59	2580	2742	2657	2657
query60	359	333	356	333
query61	129	122	125	122
query62	818	718	659	659
query63	218	192	194	192
query64	4299	1010	725	725
query65	4304	4222	4197	4197
query66	1152	407	327	327
query67	15350	15034	14886	14886
query68	8208	920	569	569
query69	495	349	273	273
query70	1276	1145	1034	1034
query71	492	334	310	310
query72	5357	4756	4871	4756
query73	729	611	360	360
query74	9213	9086	8968	8968
query75	3884	3082	2642	2642
query76	3743	1164	750	750
query77	793	490	320	320
query78	9614	9650	8817	8817
query79	3217	811	597	597
query80	626	539	487	487
query81	484	252	220	220
query82	476	134	106	106
query83	283	268	236	236
query84	292	107	83	83
query85	780	358	326	326
query86	360	296	304	296
query87	4320	4266	4179	4179
query88	3225	2206	2176	2176
query89	423	305	286	286
query90	1939	228	229	228
query91	137	178	120	120
query92	88	69	72	69
query93	2368	986	632	632
query94	679	418	311	311
query95	399	315	311	311
query96	487	584	277	277
query97	2636	2671	2573	2573
query98	238	217	220	217
query99	1467	1401	1279	1279
Total cold run time: 275222 ms
Total hot run time: 184829 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 53a661628f41f99a65c372574da3e74d471caccb, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.04	0.04
query3	0.24	0.07	0.08
query4	1.63	0.11	0.11
query5	0.43	0.42	0.41
query6	1.16	0.63	0.67
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.61	0.52	0.52
query10	0.58	0.57	0.57
query11	0.15	0.11	0.11
query12	0.15	0.13	0.12
query13	0.63	0.62	0.61
query14	0.82	0.82	0.84
query15	0.86	0.84	0.90
query16	0.39	0.39	0.40
query17	1.04	1.03	1.07
query18	0.21	0.20	0.20
query19	1.93	1.85	1.82
query20	0.01	0.01	0.02
query21	15.39	0.92	0.55
query22	0.79	1.29	0.82
query23	14.74	1.40	0.64
query24	6.69	1.55	1.02
query25	0.48	0.30	0.05
query26	0.59	0.15	0.13
query27	0.08	0.06	0.05
query28	9.93	0.98	0.43
query29	12.54	3.93	3.23
query30	3.12	3.02	2.94
query31	2.82	0.60	0.39
query32	3.25	0.55	0.47
query33	3.10	3.28	3.12
query34	16.06	5.46	4.87
query35	4.94	4.92	4.93
query36	0.69	0.51	0.49
query37	0.09	0.08	0.07
query38	0.06	0.04	0.04
query39	0.04	0.03	0.04
query40	0.18	0.14	0.13
query41	0.08	0.04	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.77 s
Total hot run time: 32.84 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 59.03% (16629/28172)
Line Coverage 47.90% (150836/314913)
Region Coverage 36.71% (112920/307595)
Branch Coverage 39.65% (50149/126469)

@airborne12
Copy link
Member Author

run check_coverage

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 13, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.86% (22633/27649)
Line Coverage 74.49% (234576/314920)
Region Coverage 61.73% (194991/315856)
Branch Coverage 65.86% (84316/128015)

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit cea20ae into apache:master Aug 14, 2025
28 of 30 checks passed
@airborne12 airborne12 deleted the fix-case branch August 14, 2025 01:55
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Aug 14, 2025
…erted index (apache#54666)

Issue Number: close #xxx

Related PR: apache#54619

Problem Summary:
When no parser is specified, the inverted index writer currently creates
a default analyzer (simple analyzer), which can cause unnecessary
performance overhead. This PR addresses this by setting the analyzer to
nullptr to avoid the overhead.
Note: This PR should be merged together with or after apache#54619.
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Aug 14, 2025
…erted index (apache#54666)

Issue Number: close #xxx

Related PR: apache#54619

Problem Summary:
When no parser is specified, the inverted index writer currently creates
a default analyzer (simple analyzer), which can cause unnecessary
performance overhead. This PR addresses this by setting the analyzer to
nullptr to avoid the overhead.
Note: This PR should be merged together with or after apache#54619.
morrySnow pushed a commit that referenced this pull request Aug 15, 2025
dataroaring pushed a commit that referenced this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.8-merged dev/3.1.0-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments