Skip to content

[fix](export) fix concurrent modification issue with export job#43051

Merged
morningman merged 1 commit intoapache:masterfrom
morningman:fix_concurrent_export
Nov 1, 2024
Merged

[fix](export) fix concurrent modification issue with export job#43051
morningman merged 1 commit intoapache:masterfrom
morningman:fix_concurrent_export

Conversation

@morningman
Copy link
Contributor

What problem does this PR solve?

Related PR: #42950

Problem Summary:

PR #42950 change some logic in ExportJob, by removing the taskIdToExecutor, which is
a thread safe ConcurrentHashMap.
But there is a problem that, when cancelling a export job, it will clear the jobExecutorList in ExportJob,
and meanwhile, this jobExecutorList may being traversed when creating the export job,
causing concurrent modification exception.

This PR fix it by locking the writeLock of ExportMgr when cancelling the export job.

Check List (For Committer)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No colde files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.
  • Release note

    None

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40871 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c1a8710e0dee400f9b78b5cfedb46f4f0fdc104c, data reload: false

------ Round 1 ----------------------------------
q1	17568	7368	7255	7255
q2	2050	165	181	165
q3	10590	1077	1177	1077
q4	10504	820	809	809
q5	7754	3028	3040	3028
q6	244	150	147	147
q7	1026	609	603	603
q8	9347	1916	1984	1916
q9	6548	6407	6456	6407
q10	7097	2447	2445	2445
q11	479	247	256	247
q12	408	212	211	211
q13	17774	3000	2968	2968
q14	238	213	210	210
q15	568	532	503	503
q16	652	597	579	579
q17	971	522	531	522
q18	7207	6653	6560	6560
q19	1332	1078	885	885
q20	462	179	172	172
q21	4035	3177	3173	3173
q22	1132	989	1003	989
Total cold run time: 107986 ms
Total hot run time: 40871 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7335	7192	7225	7192
q2	322	225	224	224
q3	2843	2939	2945	2939
q4	2053	1871	1902	1871
q5	5766	5716	5736	5716
q6	225	141	138	138
q7	2254	1828	1845	1828
q8	3357	3551	3536	3536
q9	8852	8875	8866	8866
q10	3572	3515	3565	3515
q11	618	494	501	494
q12	803	588	620	588
q13	8761	3146	3192	3146
q14	324	287	276	276
q15	576	523	540	523
q16	674	662	657	657
q17	1858	1640	1626	1626
q18	8434	7795	7720	7720
q19	1711	1533	1525	1525
q20	2106	1884	1879	1879
q21	5542	5387	5429	5387
q22	1126	1085	1029	1029
Total cold run time: 69112 ms
Total hot run time: 60675 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195747 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c1a8710e0dee400f9b78b5cfedb46f4f0fdc104c, data reload: false

query1	1225	940	898	898
query2	6231	2160	2140	2140
query3	10833	3996	4064	3996
query4	68178	29048	23554	23554
query5	5008	447	431	431
query6	401	175	167	167
query7	5589	291	290	290
query8	314	229	224	224
query9	9015	2714	2741	2714
query10	470	280	265	265
query11	17509	15434	15821	15434
query12	168	102	106	102
query13	1542	426	443	426
query14	9992	7280	6661	6661
query15	200	176	173	173
query16	7042	441	443	441
query17	1022	558	565	558
query18	1797	297	294	294
query19	191	155	146	146
query20	117	109	113	109
query21	199	99	99	99
query22	4579	4325	4414	4325
query23	34241	34068	34332	34068
query24	5968	2773	2723	2723
query25	514	387	400	387
query26	648	156	154	154
query27	1675	276	297	276
query28	4044	2470	2423	2423
query29	688	430	416	416
query30	229	157	152	152
query31	981	795	804	795
query32	63	56	61	56
query33	430	272	269	269
query34	920	491	504	491
query35	865	730	748	730
query36	1072	932	944	932
query37	119	72	74	72
query38	4340	4232	4227	4227
query39	1502	1428	1405	1405
query40	199	100	99	99
query41	47	44	46	44
query42	113	98	100	98
query43	542	505	502	502
query44	1159	817	803	803
query45	181	169	163	163
query46	1115	713	699	699
query47	1976	1860	1894	1860
query48	411	324	324	324
query49	739	407	399	399
query50	808	406	391	391
query51	7287	7294	7137	7137
query52	100	117	86	86
query53	248	179	181	179
query54	516	405	395	395
query55	81	74	70	70
query56	262	232	230	230
query57	1284	1194	1163	1163
query58	209	198	216	198
query59	3189	3040	3067	3040
query60	269	246	231	231
query61	102	101	104	101
query62	815	669	660	660
query63	218	192	182	182
query64	1321	630	635	630
query65	3258	3178	3235	3178
query66	708	296	295	295
query67	16021	15803	15676	15676
query68	3497	584	606	584
query69	410	248	266	248
query70	1198	1148	1128	1128
query71	363	261	251	251
query72	6335	3984	3937	3937
query73	773	357	366	357
query74	10085	8949	9125	8949
query75	3404	2646	2670	2646
query76	1786	1114	1086	1086
query77	506	280	273	273
query78	10521	9469	9373	9373
query79	1602	603	604	603
query80	878	418	421	418
query81	513	237	239	237
query82	1251	115	111	111
query83	222	132	141	132
query84	274	66	71	66
query85	903	285	309	285
query86	333	307	280	280
query87	4779	4565	4680	4565
query88	3601	2191	2185	2185
query89	420	289	283	283
query90	2019	180	180	180
query91	131	94	99	94
query92	63	47	47	47
query93	1906	548	550	548
query94	796	285	290	285
query95	343	243	246	243
query96	609	284	284	284
query97	2905	2729	2745	2729
query98	210	199	197	197
query99	1592	1295	1287	1287
Total cold run time: 317327 ms
Total hot run time: 195747 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c1a8710e0dee400f9b78b5cfedb46f4f0fdc104c, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.03
query3	0.23	0.07	0.07
query4	1.64	0.10	0.10
query5	0.41	0.41	0.42
query6	1.17	0.65	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.51	0.49
query10	0.54	0.55	0.55
query11	0.14	0.11	0.11
query12	0.13	0.11	0.12
query13	0.60	0.61	0.60
query14	2.74	2.73	2.86
query15	0.90	0.82	0.82
query16	0.38	0.38	0.38
query17	1.00	1.07	1.10
query18	0.20	0.20	0.20
query19	1.89	1.87	1.88
query20	0.01	0.01	0.01
query21	15.36	0.58	0.55
query22	2.84	2.06	1.65
query23	17.11	0.87	0.88
query24	3.20	1.50	1.19
query25	0.26	0.08	0.23
query26	0.40	0.13	0.13
query27	0.04	0.04	0.04
query28	10.22	1.11	1.08
query29	12.54	3.23	3.23
query30	0.24	0.06	0.06
query31	2.86	0.39	0.37
query32	3.29	0.47	0.46
query33	3.00	3.02	3.02
query34	16.88	4.47	4.47
query35	4.50	4.50	4.49
query36	0.68	0.48	0.48
query37	0.08	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.03
query40	0.15	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.61 s
Total hot run time: 32.75 s

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 1, 2024
@morningman morningman merged commit 77ac531 into apache:master Nov 1, 2024
github-actions bot pushed a commit that referenced this pull request Nov 1, 2024
### What problem does this PR solve?

Related PR: #42950

Problem Summary:

PR #42950 change some logic in ExportJob, by removing the
`taskIdToExecutor`, which is
a thread safe ConcurrentHashMap.
But there is a problem that, when cancelling a export job, it will clear
the `jobExecutorList` in ExportJob,
and meanwhile, this `jobExecutorList` may being traversed when creating
the export job,
causing concurrent modification exception.

This PR fix it by locking the writeLock of ExportMgr when cancelling the
export job.
morningman added a commit to morningman/doris that referenced this pull request Nov 6, 2024
…he#43051)

### What problem does this PR solve?

Related PR: apache#42950

Problem Summary:

PR apache#42950 change some logic in ExportJob, by removing the
`taskIdToExecutor`, which is
a thread safe ConcurrentHashMap.
But there is a problem that, when cancelling a export job, it will clear
the `jobExecutorList` in ExportJob,
and meanwhile, this `jobExecutorList` may being traversed when creating
the export job,
causing concurrent modification exception.

This PR fix it by locking the writeLock of ExportMgr when cancelling the
export job.
morningman added a commit that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged p0_b reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments