Skip to content

[fix](csv reader) fix csv parse error when use enclose with multi-char column separator#54581

Merged
liaoxin01 merged 1 commit intoapache:masterfrom
sollhui:fix_csv_enclose
Aug 14, 2025
Merged

[fix](csv reader) fix csv parse error when use enclose with multi-char column separator#54581
liaoxin01 merged 1 commit intoapache:masterfrom
sollhui:fix_csv_enclose

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Aug 11, 2025

What problem does this PR solve?

Idx represents the position where the buffer is parsed.

If the buffer does not read a complete row, as shown in the following figure, idx will become the length of the buffer, and then the buffer will be expanded. If some of the column separators happen to be at the end of the buffer and some are not read, when reading after expansion, it will be impossible to read the complete column separators, resulting in parsing errors.
image

The solution is to do a pre reading of the column separator length when parsing the column separator.

image

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Aug 11, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33761 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit da3b8004855afc83e206ee94ff8b2216e7e4aa8c, data reload: false

------ Round 1 ----------------------------------
q1	17591	5283	5040	5040
q2	1922	283	187	187
q3	10328	1265	731	731
q4	10207	992	523	523
q5	7504	2391	2270	2270
q6	179	164	132	132
q7	883	729	595	595
q8	9304	1200	1073	1073
q9	6759	5181	5111	5111
q10	6880	2379	1949	1949
q11	481	285	274	274
q12	349	342	219	219
q13	17770	3629	3046	3046
q14	244	248	219	219
q15	540	472	476	472
q16	420	430	362	362
q17	569	838	370	370
q18	7395	7122	7168	7122
q19	1088	932	555	555
q20	340	332	214	214
q21	3724	2613	2306	2306
q22	1081	1048	991	991
Total cold run time: 105558 ms
Total hot run time: 33761 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5116	5075	5121	5075
q2	237	318	222	222
q3	2145	2699	2286	2286
q4	1327	1799	1284	1284
q5	4209	4345	4552	4345
q6	216	177	128	128
q7	2023	1965	1793	1793
q8	2638	2772	2615	2615
q9	7634	7209	7184	7184
q10	3072	3337	3065	3065
q11	566	503	494	494
q12	672	789	641	641
q13	3383	3985	3323	3323
q14	339	293	278	278
q15	520	474	488	474
q16	457	509	441	441
q17	1170	1584	1378	1378
q18	7943	7746	7663	7663
q19	811	842	985	842
q20	2029	2081	1930	1930
q21	5005	4506	4282	4282
q22	1115	1044	1022	1022
Total cold run time: 52627 ms
Total hot run time: 50765 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184930 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit da3b8004855afc83e206ee94ff8b2216e7e4aa8c, data reload: false

query1	1005	428	405	405
query2	6514	1738	1762	1738
query3	6757	220	221	220
query4	26245	23558	23302	23302
query5	4419	627	492	492
query6	313	215	197	197
query7	4630	505	290	290
query8	275	235	218	218
query9	8589	2903	2867	2867
query10	461	350	294	294
query11	15440	14909	14903	14903
query12	163	110	113	110
query13	1663	552	424	424
query14	9393	5802	5799	5799
query15	209	192	167	167
query16	7668	647	464	464
query17	1419	721	593	593
query18	2062	423	325	325
query19	198	185	177	177
query20	131	124	119	119
query21	216	130	108	108
query22	4083	4212	4077	4077
query23	34359	33263	33449	33263
query24	8161	2329	2353	2329
query25	535	503	398	398
query26	1234	269	153	153
query27	2714	502	338	338
query28	4321	2260	2232	2232
query29	741	553	477	477
query30	281	221	192	192
query31	903	797	720	720
query32	84	77	83	77
query33	566	372	346	346
query34	779	820	508	508
query35	796	830	751	751
query36	945	985	905	905
query37	130	123	86	86
query38	4099	3957	4050	3957
query39	1449	1391	1398	1391
query40	216	125	113	113
query41	59	57	57	57
query42	119	108	113	108
query43	512	498	480	480
query44	1333	861	857	857
query45	180	169	165	165
query46	845	991	650	650
query47	1735	1813	1746	1746
query48	376	416	311	311
query49	746	472	392	392
query50	652	675	384	384
query51	4168	4124	4096	4096
query52	110	111	98	98
query53	226	261	199	199
query54	586	582	514	514
query55	91	89	83	83
query56	306	309	311	309
query57	1177	1192	1121	1121
query58	281	264	276	264
query59	2571	2723	2564	2564
query60	344	332	324	324
query61	131	127	126	126
query62	840	759	663	663
query63	227	188	190	188
query64	4264	1003	694	694
query65	4250	4174	4256	4174
query66	1077	415	378	378
query67	15536	15355	14902	14902
query68	9266	903	571	571
query69	473	327	286	286
query70	1249	1161	1178	1161
query71	464	322	309	309
query72	5284	4769	4807	4769
query73	741	606	358	358
query74	8893	9003	8847	8847
query75	4251	3068	2588	2588
query76	3712	1118	741	741
query77	818	427	317	317
query78	9495	9595	8819	8819
query79	1789	831	590	590
query80	667	538	467	467
query81	459	259	222	222
query82	459	137	108	108
query83	277	254	236	236
query84	298	104	79	79
query85	881	361	323	323
query86	341	325	295	295
query87	4330	4244	4116	4116
query88	2777	2205	2157	2157
query89	382	317	281	281
query90	1943	213	216	213
query91	138	141	119	119
query92	90	69	63	63
query93	1121	970	645	645
query94	680	385	294	294
query95	393	309	298	298
query96	484	588	271	271
query97	2688	2732	2542	2542
query98	233	220	208	208
query99	1429	1424	1265	1265
Total cold run time: 273413 ms
Total hot run time: 184930 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.39 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit da3b8004855afc83e206ee94ff8b2216e7e4aa8c, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.24	0.08	0.07
query4	1.61	0.12	0.11
query5	0.45	0.42	0.39
query6	1.18	0.64	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.60	0.53	0.52
query10	0.58	0.57	0.57
query11	0.16	0.11	0.11
query12	0.14	0.11	0.12
query13	0.62	0.61	0.61
query14	0.81	0.82	0.84
query15	0.88	0.86	0.84
query16	0.37	0.39	0.38
query17	1.08	1.01	1.07
query18	0.21	0.20	0.20
query19	1.87	1.78	1.84
query20	0.02	0.01	0.01
query21	15.40	0.89	0.57
query22	0.75	1.25	0.76
query23	14.81	1.35	0.62
query24	7.28	0.79	0.66
query25	0.55	0.29	0.07
query26	0.46	0.17	0.12
query27	0.05	0.06	0.05
query28	10.16	0.86	0.42
query29	12.56	3.93	3.23
query30	3.10	3.02	2.99
query31	2.82	0.59	0.38
query32	3.24	0.54	0.47
query33	2.96	3.10	3.11
query34	16.12	5.47	4.86
query35	4.92	4.95	4.97
query36	0.68	0.50	0.49
query37	0.10	0.07	0.07
query38	0.04	0.05	0.04
query39	0.04	0.02	0.03
query40	0.19	0.13	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 107.41 s
Total hot run time: 32.39 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 59.01% (16606/28141)
Line Coverage 47.88% (150512/314355)
Region Coverage 36.71% (112721/307097)
Branch Coverage 39.63% (50014/126198)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.55% (22520/27616)
Line Coverage 74.13% (233030/314357)
Region Coverage 61.22% (193048/315359)
Branch Coverage 65.37% (83509/127746)

@sollhui
Copy link
Contributor Author

sollhui commented Aug 12, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33650 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e284bede391dec399ba1264f562266fd4b9c76fe, data reload: false

------ Round 1 ----------------------------------
q1	17621	5145	5332	5145
q2	1908	280	180	180
q3	10346	1290	682	682
q4	10214	987	505	505
q5	7502	2308	2326	2308
q6	171	159	127	127
q7	892	734	608	608
q8	9289	1284	1053	1053
q9	7367	5084	5034	5034
q10	6853	2359	1975	1975
q11	466	281	256	256
q12	346	347	216	216
q13	17757	3628	3058	3058
q14	230	230	219	219
q15	545	476	477	476
q16	421	418	366	366
q17	571	856	354	354
q18	7467	7078	7013	7013
q19	1075	931	550	550
q20	336	330	218	218
q21	3874	2517	2324	2324
q22	1078	1021	983	983
Total cold run time: 106329 ms
Total hot run time: 33650 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5339	5118	5089	5089
q2	241	326	216	216
q3	2204	2667	2304	2304
q4	1359	1756	1323	1323
q5	4183	4331	4461	4331
q6	223	172	127	127
q7	1992	1953	1853	1853
q8	2640	2596	2606	2596
q9	7362	7294	7340	7294
q10	3106	3255	2813	2813
q11	597	531	502	502
q12	690	765	628	628
q13	3448	3945	3227	3227
q14	309	311	280	280
q15	519	488	492	488
q16	449	668	466	466
q17	1151	1551	1412	1412
q18	7835	7729	7631	7631
q19	799	758	827	758
q20	2013	1989	1803	1803
q21	4724	4398	4223	4223
q22	1078	1044	991	991
Total cold run time: 52261 ms
Total hot run time: 50355 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184469 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e284bede391dec399ba1264f562266fd4b9c76fe, data reload: false

query1	998	394	385	385
query2	6515	1713	1713	1713
query3	6739	222	226	222
query4	26639	23275	22716	22716
query5	4316	626	488	488
query6	292	196	206	196
query7	4623	502	281	281
query8	278	228	217	217
query9	9095	3064	2976	2976
query10	480	343	280	280
query11	15986	15199	15035	15035
query12	171	115	119	115
query13	1662	553	414	414
query14	8464	5661	5737	5661
query15	204	182	163	163
query16	7437	623	474	474
query17	1219	716	595	595
query18	1995	411	315	315
query19	191	239	155	155
query20	125	120	114	114
query21	210	124	107	107
query22	4152	4228	4094	4094
query23	34031	33269	33197	33197
query24	8227	2358	2360	2358
query25	547	488	402	402
query26	1241	271	162	162
query27	2737	503	344	344
query28	4348	2224	2203	2203
query29	758	593	454	454
query30	287	225	185	185
query31	877	796	719	719
query32	84	76	74	74
query33	556	367	327	327
query34	782	838	505	505
query35	784	827	780	780
query36	978	1058	916	916
query37	126	112	85	85
query38	4015	4008	3924	3924
query39	1490	1403	1382	1382
query40	211	124	112	112
query41	58	56	53	53
query42	118	109	108	108
query43	502	489	462	462
query44	1326	848	837	837
query45	175	170	162	162
query46	875	1009	633	633
query47	1767	1799	1744	1744
query48	374	404	310	310
query49	704	473	391	391
query50	648	690	397	397
query51	4146	4065	4040	4040
query52	108	106	103	103
query53	230	265	193	193
query54	579	576	510	510
query55	90	80	86	80
query56	310	305	294	294
query57	1162	1195	1130	1130
query58	282	264	260	260
query59	2669	2880	2590	2590
query60	338	329	320	320
query61	156	122	119	119
query62	806	740	664	664
query63	220	182	186	182
query64	4313	1024	673	673
query65	4306	4169	4196	4169
query66	1141	407	310	310
query67	15604	15206	15093	15093
query68	7913	910	563	563
query69	474	316	278	278
query70	1194	1148	1101	1101
query71	422	320	311	311
query72	5563	4786	4832	4786
query73	684	653	352	352
query74	8978	9116	8878	8878
query75	3341	3076	2614	2614
query76	3245	1142	748	748
query77	604	416	323	323
query78	9465	9772	8857	8857
query79	1927	797	579	579
query80	651	543	469	469
query81	499	257	215	215
query82	187	135	100	100
query83	253	246	238	238
query84	302	97	79	79
query85	755	365	333	333
query86	382	312	298	298
query87	4199	4271	4224	4224
query88	2806	2196	2194	2194
query89	391	318	281	281
query90	1964	215	213	213
query91	133	138	115	115
query92	84	65	68	65
query93	1553	966	637	637
query94	742	400	299	299
query95	392	310	301	301
query96	487	596	275	275
query97	2585	2682	2565	2565
query98	231	223	211	211
query99	1301	1376	1295	1295
Total cold run time: 270584 ms
Total hot run time: 184469 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.23 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e284bede391dec399ba1264f562266fd4b9c76fe, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.04	0.04
query3	0.25	0.07	0.07
query4	1.64	0.11	0.11
query5	0.42	0.40	0.42
query6	1.17	0.66	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.60	0.52	0.51
query10	0.58	0.57	0.58
query11	0.15	0.12	0.12
query12	0.15	0.12	0.11
query13	0.64	0.60	0.60
query14	0.80	0.82	0.83
query15	0.88	0.85	0.85
query16	0.39	0.39	0.40
query17	1.03	1.05	1.05
query18	0.21	0.19	0.20
query19	1.91	1.82	1.84
query20	0.01	0.01	0.01
query21	15.40	0.91	0.56
query22	0.75	1.26	0.62
query23	14.92	1.41	0.64
query24	7.17	1.03	0.50
query25	0.47	0.12	0.22
query26	0.59	0.16	0.12
query27	0.06	0.06	0.05
query28	9.97	0.93	0.44
query29	12.58	3.96	3.24
query30	3.02	3.03	2.94
query31	2.82	0.58	0.38
query32	3.25	0.54	0.47
query33	3.04	3.13	3.13
query34	15.99	5.45	4.84
query35	4.94	4.96	4.97
query36	0.69	0.51	0.49
query37	0.09	0.07	0.07
query38	0.06	0.05	0.03
query39	0.03	0.03	0.02
query40	0.18	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 107.2 s
Total hot run time: 32.23 s

@sollhui sollhui marked this pull request as ready for review August 12, 2025 02:26
@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 59.00% (16604/28141)
Line Coverage 47.87% (150493/314355)
Region Coverage 36.69% (112675/307097)
Branch Coverage 39.63% (50012/126198)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.48% (22501/27616)
Line Coverage 74.12% (232993/314357)
Region Coverage 61.33% (193421/315359)
Branch Coverage 65.46% (83625/127746)

@sollhui sollhui changed the title [fix](csv reader) fix csv parse error when use enclose with multi-char line delimiter [fix](csv reader) fix csv parse error when use enclose with multi-char delimiter Aug 13, 2025
@sollhui sollhui marked this pull request as draft August 13, 2025 02:59
@sollhui sollhui changed the title [fix](csv reader) fix csv parse error when use enclose with multi-char delimiter [fix](csv reader) fix csv parse error when use enclose with multi-char column separator Aug 13, 2025
@sollhui sollhui marked this pull request as ready for review August 13, 2025 06:43
@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33974 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 78a8f0c84fe0ccb2031d28e9490ae63519b10f26, data reload: false

------ Round 1 ----------------------------------
q1	17581	5247	5119	5119
q2	1914	278	180	180
q3	10326	1272	723	723
q4	10218	1015	525	525
q5	7523	2383	2344	2344
q6	176	164	135	135
q7	904	758	599	599
q8	9301	1490	1077	1077
q9	7010	5132	5171	5132
q10	6891	2374	1973	1973
q11	487	291	263	263
q12	343	350	225	225
q13	17761	3669	3099	3099
q14	236	244	207	207
q15	541	476	485	476
q16	437	438	372	372
q17	603	858	366	366
q18	7506	7116	7057	7057
q19	1088	967	560	560
q20	358	334	225	225
q21	4132	3228	2361	2361
q22	1048	1030	956	956
Total cold run time: 106384 ms
Total hot run time: 33974 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5153	5108	5104	5104
q2	250	328	223	223
q3	2189	2737	2307	2307
q4	1396	1812	1339	1339
q5	4221	4579	4515	4515
q6	215	173	127	127
q7	2049	1969	1798	1798
q8	2622	2726	2674	2674
q9	7623	7281	7164	7164
q10	3131	3299	2879	2879
q11	568	548	522	522
q12	694	783	624	624
q13	3562	3900	3329	3329
q14	300	294	288	288
q15	541	482	481	481
q16	467	512	448	448
q17	1196	1556	1388	1388
q18	7948	7763	7651	7651
q19	954	866	870	866
q20	2084	2149	1885	1885
q21	5118	4505	4347	4347
q22	1116	1025	1013	1013
Total cold run time: 53397 ms
Total hot run time: 50972 ms

@sollhui sollhui reopened this Aug 13, 2025
@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2025

run buildall

1 similar comment
@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2025

run buildall

@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2025

run buildall

@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2025

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 13, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 33927 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4171d784afc41bf5cc2b178c406ae5fa7073caf, data reload: false

------ Round 1 ----------------------------------
q1	17636	5223	5073	5073
q2	1908	295	175	175
q3	10313	1316	724	724
q4	10273	1027	514	514
q5	8391	2412	2328	2328
q6	202	157	130	130
q7	906	752	622	622
q8	9352	1300	1100	1100
q9	7250	5244	5093	5093
q10	6977	2427	2009	2009
q11	498	288	274	274
q12	365	344	222	222
q13	17794	3608	3018	3018
q14	229	229	214	214
q15	558	497	490	490
q16	425	424	383	383
q17	603	863	361	361
q18	7417	7054	7070	7054
q19	1449	968	576	576
q20	352	325	221	221
q21	3838	3199	2374	2374
q22	1064	1041	972	972
Total cold run time: 107800 ms
Total hot run time: 33927 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5222	5111	5075	5075
q2	244	331	220	220
q3	2201	2689	2291	2291
q4	1321	1807	1353	1353
q5	4261	4553	4610	4553
q6	216	173	132	132
q7	1962	1983	1850	1850
q8	2649	2648	2547	2547
q9	7419	7255	7274	7255
q10	3153	3291	2924	2924
q11	570	519	487	487
q12	985	771	621	621
q13	3452	3945	3289	3289
q14	310	476	279	279
q15	521	474	462	462
q16	419	487	437	437
q17	1191	1616	1371	1371
q18	8037	7845	7569	7569
q19	833	857	893	857
q20	1980	2058	1870	1870
q21	5043	4275	4340	4275
q22	1080	1049	994	994
Total cold run time: 53069 ms
Total hot run time: 50711 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183740 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4171d784afc41bf5cc2b178c406ae5fa7073caf, data reload: false

query1	988	404	424	404
query2	6556	1778	1754	1754
query3	6741	229	212	212
query4	26277	23309	22863	22863
query5	4325	629	469	469
query6	316	227	213	213
query7	4642	487	279	279
query8	265	251	216	216
query9	8606	2833	2837	2833
query10	481	323	280	280
query11	15782	15028	14728	14728
query12	160	112	112	112
query13	1661	557	426	426
query14	8507	5674	5752	5674
query15	200	185	177	177
query16	7158	649	434	434
query17	951	725	589	589
query18	1993	421	317	317
query19	192	196	166	166
query20	132	120	117	117
query21	220	125	109	109
query22	4138	4278	4140	4140
query23	34372	33192	33313	33192
query24	8065	2453	2337	2337
query25	540	464	391	391
query26	1230	269	154	154
query27	2736	489	339	339
query28	4411	2229	2193	2193
query29	765	550	450	450
query30	283	218	190	190
query31	893	774	728	728
query32	83	75	75	75
query33	544	365	331	331
query34	805	833	519	519
query35	789	818	741	741
query36	978	1029	908	908
query37	118	110	93	93
query38	4073	4061	3936	3936
query39	1466	1420	1390	1390
query40	216	128	116	116
query41	60	55	56	55
query42	119	108	109	108
query43	511	484	461	461
query44	1360	846	841	841
query45	200	173	164	164
query46	864	997	642	642
query47	1771	1791	1685	1685
query48	384	422	325	325
query49	735	483	396	396
query50	633	678	412	412
query51	4099	4093	4091	4091
query52	114	110	100	100
query53	247	257	193	193
query54	593	584	517	517
query55	85	93	86	86
query56	308	311	286	286
query57	1193	1167	1118	1118
query58	278	268	295	268
query59	2625	2721	2534	2534
query60	346	327	332	327
query61	125	129	156	129
query62	814	713	626	626
query63	224	187	190	187
query64	4369	1026	734	734
query65	4292	4204	4219	4204
query66	1153	425	319	319
query67	15468	15360	15089	15089
query68	8089	908	563	563
query69	478	316	278	278
query70	1177	1206	1124	1124
query71	450	343	324	324
query72	5580	4665	4590	4590
query73	720	567	354	354
query74	8887	9051	8571	8571
query75	3756	3041	2564	2564
query76	3628	1123	728	728
query77	803	396	334	334
query78	9595	9530	8797	8797
query79	2757	826	579	579
query80	619	531	494	494
query81	514	259	221	221
query82	458	136	104	104
query83	282	247	235	235
query84	301	110	85	85
query85	790	363	354	354
query86	395	306	293	293
query87	4263	4343	4200	4200
query88	3471	2203	2181	2181
query89	390	316	283	283
query90	1823	229	215	215
query91	141	139	120	120
query92	89	73	65	65
query93	1888	967	638	638
query94	673	436	308	308
query95	390	326	302	302
query96	499	567	278	278
query97	2645	2666	2587	2587
query98	234	217	205	205
query99	1437	1415	1274	1274
Total cold run time: 272850 ms
Total hot run time: 183740 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.6 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e4171d784afc41bf5cc2b178c406ae5fa7073caf, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.24	0.08	0.07
query4	1.63	0.11	0.11
query5	0.42	0.42	0.40
query6	1.19	0.64	0.64
query7	0.02	0.02	0.02
query8	0.05	0.03	0.03
query9	0.61	0.53	0.52
query10	0.57	0.58	0.57
query11	0.15	0.10	0.10
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	0.79	0.81	0.83
query15	0.88	0.87	0.86
query16	0.39	0.39	0.38
query17	1.05	1.04	1.05
query18	0.21	0.19	0.20
query19	1.95	1.89	1.81
query20	0.01	0.01	0.02
query21	15.40	0.98	0.56
query22	0.77	1.13	0.75
query23	14.92	1.37	0.64
query24	6.95	0.96	1.01
query25	0.54	0.31	0.08
query26	0.55	0.15	0.13
query27	0.05	0.04	0.04
query28	9.58	0.93	0.43
query29	12.55	3.92	3.22
query30	3.06	3.00	3.00
query31	2.82	0.59	0.38
query32	3.23	0.55	0.47
query33	3.07	3.04	3.08
query34	16.04	5.45	4.81
query35	4.90	4.91	4.96
query36	0.72	0.51	0.50
query37	0.10	0.07	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.18	0.13	0.14
query41	0.09	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.71 s
Total hot run time: 32.6 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 59.04% (16627/28164)
Line Coverage 47.93% (150825/314685)
Region Coverage 36.74% (112946/307417)
Branch Coverage 39.68% (50117/126309)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.73% (22590/27641)
Line Coverage 74.35% (233970/314692)
Region Coverage 61.52% (194199/315678)
Branch Coverage 65.67% (83958/127855)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.73% (22590/27641)
Line Coverage 74.35% (233985/314692)
Region Coverage 61.56% (194343/315678)
Branch Coverage 65.67% (83968/127855)

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit d0f3af0 into apache:master Aug 14, 2025
27 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Aug 14, 2025
…r column separator (#54581)

### What problem does this PR solve?

Idx represents the position where the buffer is parsed. 

If the buffer does not read a complete row, as shown in the following
figure, idx will become the length of the buffer, and then the buffer
will be expanded. If some of the column separators happen to be at the
end of the buffer and some are not read, when reading after expansion,
it will be impossible to read the complete column separators, resulting
in parsing errors.
morrySnow pushed a commit that referenced this pull request Aug 15, 2025
…th multi-char column separator #54581 (#54764)

Cherry-picked from #54581

Co-authored-by: hui lai <laihui@selectdb.com>
sollhui added a commit to sollhui/doris that referenced this pull request Aug 20, 2025
…r column separator (apache#54581)

Idx represents the position where the buffer is parsed.

If the buffer does not read a complete row, as shown in the following
figure, idx will become the length of the buffer, and then the buffer
will be expanded. If some of the column separators happen to be at the
end of the buffer and some are not read, when reading after expansion,
it will be impossible to read the complete column separators, resulting
in parsing errors.
dataroaring pushed a commit that referenced this pull request Aug 22, 2025
…th multi-char column separator (#54581) (#55052)

pick #54581

Idx represents the position where the buffer is parsed.

If the buffer does not read a complete row, as shown in the following
figure, idx will become the length of the buffer, and then the buffer
will be expanded. If some of the column separators happen to be at the
end of the buffer and some are not read, when reading after expansion,
it will be impossible to read the complete column separators, resulting
in parsing errors.
@gavinchou gavinchou mentioned this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.x dev/2.1.x-conflict dev/3.0.8-merged dev/3.1.0-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments