Skip to content

[fix](insert-into) fix insert into lose data#29802

Merged
dataroaring merged 2 commits intoapache:masterfrom
sollhui:insert_into_lose_data
Jan 10, 2024
Merged

[fix](insert-into) fix insert into lose data#29802
dataroaring merged 2 commits intoapache:masterfrom
sollhui:insert_into_lose_data

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Jan 10, 2024

Proposed changes

There are some periodic reports during the load process, and the reports from the intermediate process may be concurrent with the last report. The last report causes the counter to decrease to zero, but it is possible that the report without commit-info triggered the commit operation, resulting in the data not being published.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@sollhui
Copy link
Contributor Author

sollhui commented Jan 10, 2024

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.87 seconds
stream load tsv: 558 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 21.0 seconds inserted 10000000 Rows, about 476K ops/s
storage size: 17183584482 Bytes

@liaoxin01
Copy link
Contributor

Please add a description of the PR.

dataroaring
dataroaring previously approved these changes Jan 10, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@sollhui
Copy link
Contributor Author

sollhui commented Jan 10, 2024

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 10, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 38515 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 735c594aecfbd983368dc4ac611a9e5ad767f8ac, data reload: false

------ Round 1 ----------------------------------
q1	18055	4992	4980	4980
q2	2026	145	138	138
q3	10694	1066	1117	1066
q4	10534	749	775	749
q5	7864	3133	3138	3133
q6	204	126	125	125
q7	877	498	485	485
q8	9310	1974	1977	1974
q9	6789	6469	6389	6389
q10	8254	3101	3079	3079
q11	408	206	217	206
q12	360	190	192	190
q13	18104	3405	3448	3405
q14	236	217	215	215
q15	552	520	510	510
q16	437	387	382	382
q17	927	491	485	485
q18	7255	6729	6656	6656
q19	1569	1320	1311	1311
q20	554	295	300	295
q21	2766	2439	2429	2429
q22	363	314	313	313
Total cold run time: 108138 ms
Total hot run time: 38515 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5022	4991	4930	4930
q2	308	199	209	199
q3	3340	3305	3300	3300
q4	2232	2224	2229	2224
q5	5785	5800	5794	5794
q6	192	118	121	118
q7	2302	1865	1866	1865
q8	3452	3542	3551	3542
q9	8894	8849	8839	8839
q10	3743	3823	3814	3814
q11	557	435	440	435
q12	792	614	600	600
q13	6459	3216	3248	3216
q14	290	262	259	259
q15	571	524	514	514
q16	505	469	457	457
q17	2025	2030	1999	1999
q18	8789	8334	8574	8334
q19	1619	1627	1626	1626
q20	2198	1965	1968	1965
q21	6149	5809	5866	5809
q22	596	509	501	501
Total cold run time: 65820 ms
Total hot run time: 60340 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 179239 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 735c594aecfbd983368dc4ac611a9e5ad767f8ac, data reload: false

query1	937	340	329	329
query2	6718	1870	2053	1870
query3	6707	208	206	206
query4	26179	22280	22106	22106
query5	5708	500	489	489
query6	271	191	192	191
query7	4622	276	273	273
query8	231	212	205	205
query9	9111	2773	2956	2773
query10	488	241	242	241
query11	16234	15334	15379	15334
query12	137	81	76	76
query13	1634	340	343	340
query14	12102	7272	7290	7272
query15	268	192	199	192
query16	6393	275	270	270
query17	1877	490	485	485
query18	1942	284	272	272
query19	289	141	145	141
query20	91	81	78	78
query21	187	91	91	91
query22	5290	5020	4975	4975
query23	32363	31229	31342	31229
query24	12273	2848	2812	2812
query25	569	346	336	336
query26	1794	156	152	152
query27	2858	285	278	278
query28	7073	1915	1923	1915
query29	2051	397	395	395
query30	299	149	145	145
query31	985	797	786	786
query32	93	62	63	62
query33	748	272	278	272
query34	997	454	434	434
query35	892	733	788	733
query36	1371	1176	1235	1176
query37	187	69	68	68
query38	3422	3272	3301	3272
query39	1367	1284	1287	1284
query40	321	91	91	91
query41	39	34	34	34
query42	96	96	99	96
query43	524	488	506	488
query44	1094	716	719	716
query45	209	197	183	183
query46	1056	668	677	668
query47	1718	1638	1660	1638
query48	389	330	325	325
query49	1270	309	326	309
query50	736	327	333	327
query51	5391	5295	5457	5295
query52	97	87	84	84
query53	226	156	150	150
query54	1411	567	593	567
query55	101	92	86	86
query56	216	200	199	199
query57	1062	949	976	949
query58	235	214	219	214
query59	2682	2553	2577	2553
query60	252	220	238	220
query61	82	81	83	81
query62	630	448	464	448
query63	172	153	151	151
query64	5753	1763	1728	1728
query65	3357	3294	3283	3283
query66	1320	335	327	327
query67	15789	15552	15271	15271
query68	11640	522	513	513
query69	569	283	283	283
query70	1742	1613	1545	1545
query71	495	244	222	222
query72	5264	2861	2847	2847
query73	2202	317	333	317
query74	6998	6346	6457	6346
query75	4870	2254	2286	2254
query76	6356	1101	1086	1086
query77	667	271	278	271
query78	9729	8693	8693	8693
query79	1014	512	500	500
query80	551	339	360	339
query81	453	213	212	212
query82	213	87	83	83
query83	152	140	136	136
query84	272	55	57	55
query85	979	282	281	281
query86	385	392	363	363
query87	3562	3401	3399	3399
query88	2978	2235	2234	2234
query89	339	248	264	248
query90	1871	202	198	198
query91	164	135	141	135
query92	63	56	52	52
query93	1163	478	423	423
query94	756	191	198	191
query95	478	430	420	420
query96	684	319	312	312
query97	4311	4173	4193	4173
query98	212	189	189	189
query99	1133	880	884	880
Total cold run time: 295251 ms
Total hot run time: 179239 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 735c594aecfbd983368dc4ac611a9e5ad767f8ac, data reload: false

query1	0.07	0.06	0.05
query2	0.06	0.03	0.02
query3	0.25	0.12	0.11
query4	1.76	0.13	0.11
query5	0.51	0.51	0.51
query6	1.35	0.66	0.63
query7	0.01	0.02	0.01
query8	0.04	0.02	0.02
query9	0.55	0.49	0.50
query10	0.55	0.58	0.54
query11	0.12	0.09	0.09
query12	0.12	0.10	0.10
query13	0.61	0.60	0.60
query14	0.80	0.77	0.79
query15	0.81	0.80	0.80
query16	0.34	0.35	0.36
query17	0.99	0.99	0.97
query18	0.24	0.24	0.24
query19	1.88	1.77	1.73
query20	0.01	0.00	0.01
query21	15.42	0.58	0.55
query22	2.53	2.50	2.31
query23	17.38	0.82	0.77
query24	2.41	4.03	0.73
query25	2.00	0.15	0.14
query26	0.13	0.14	0.13
query27	0.14	0.14	0.15
query28	9.87	0.81	0.79
query29	12.61	3.13	3.27
query30	0.59	0.49	0.48
query31	2.78	0.38	0.36
query32	3.37	0.48	0.48
query33	3.20	3.22	3.25
query34	15.95	4.21	4.16
query35	4.23	4.16	4.15
query36	1.12	1.04	1.04
query37	0.07	0.05	0.05
query38	0.04	0.03	0.03
query39	0.02	0.02	0.02
query40	0.15	0.16	0.13
query41	0.07	0.02	0.02
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 105.2 s
Total hot run time: 31 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2024
@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 735c594aecfbd983368dc4ac611a9e5ad767f8ac with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.15 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 22.1 seconds inserted 10000000 Rows, about 452K ops/s
storage size: 17183694202 Bytes

Copy link
Collaborator

@wm1581066 wm1581066 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit d482a5f into apache:master Jan 10, 2024
@sollhui sollhui deleted the insert_into_lose_data branch January 12, 2024 03:27
liaoxin01 pushed a commit that referenced this pull request Aug 14, 2025
### What problem does this PR solve?

Issue Number: DORIS-20541

Related PR: #29802

Problem Summary:

Fix an issue where the scanned rows and loaded bytes metrics were not
updated progressively in the FE.
Although the BE periodically reports execution status, the FE was
ignoring these reports due to a change introduced in PR #29802, which
skips processing reports without the isDone flag.

This fix ensures that intermediate execution reports are processed,
allowing progressive updates of scanned rows and loaded bytes during
query execution.
kaijchen added a commit to kaijchen/doris that referenced this pull request Aug 14, 2025
…e#54606)

Issue Number: DORIS-20541

Related PR: apache#29802

Problem Summary:

Fix an issue where the scanned rows and loaded bytes metrics were not
updated progressively in the FE.
Although the BE periodically reports execution status, the FE was
ignoring these reports due to a change introduced in PR apache#29802, which
skips processing reports without the isDone flag.

This fix ensures that intermediate execution reports are processed,
allowing progressive updates of scanned rows and loaded bytes during
query execution.
kaijchen added a commit to kaijchen/doris that referenced this pull request Aug 14, 2025
…e#54606)

Issue Number: DORIS-20541

Related PR: apache#29802

Problem Summary:

Fix an issue where the scanned rows and loaded bytes metrics were not
updated progressively in the FE.
Although the BE periodically reports execution status, the FE was
ignoring these reports due to a change introduced in PR apache#29802, which
skips processing reports without the isDone flag.

This fix ensures that intermediate execution reports are processed,
allowing progressive updates of scanned rows and loaded bytes during
query execution.
dataroaring pushed a commit that referenced this pull request Aug 15, 2025
…ively #54606 (#54790)

Backport #54606

### What problem does this PR solve?

Issue Number: DORIS-20541

Related PR: #29802

Problem Summary:

Fix an issue where the scanned rows and loaded bytes metrics were not
updated progressively in the FE.
Although the BE periodically reports execution status, the FE was
ignoring these reports due to a change introduced in PR #29802, which
skips processing reports without the isDone flag.

This fix ensures that intermediate execution reports are processed,
allowing progressive updates of scanned rows and loaded bytes during
query execution.
morrySnow pushed a commit that referenced this pull request Aug 15, 2025
…ively #54606 (#54787)

Backport #54606

### What problem does this PR solve?

Issue Number: DORIS-20541

Related PR: #29802

Problem Summary:

Fix an issue where the scanned rows and loaded bytes metrics were not
updated progressively in the FE.
Although the BE periodically reports execution status, the FE was
ignoring these reports due to a change introduced in PR #29802, which
skips processing reports without the isDone flag.

This fix ensures that intermediate execution reports are processed,
allowing progressive updates of scanned rows and loaded bytes during
query execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants