Skip to content

[fix](variant) fix batch insert into with structure-conflicts strings#53923

Merged
eldenmoon merged 4 commits intoapache:masterfrom
amorynan:fix-batch-insert-for-typeconflict
Jul 29, 2025
Merged

[fix](variant) fix batch insert into with structure-conflicts strings#53923
eldenmoon merged 4 commits intoapache:masterfrom
amorynan:fix-batch-insert-for-typeconflict

Conversation

@amorynan
Copy link
Contributor

What problem does this PR solve?

fix batch insert into with structure-conflicts strings
Before this we were able to successfully insert, but will meet query error
like this:

mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
Query OK, 2 rows affected (0.16 sec)
{'label':'label_9279242ae3fd40e2_aabe077db2d37bb9', 'status':'VISIBLE', 'txnId':'16028'}

mysql> desc var_nested_load_conflict;
+------------+---------------+------+-------+---------+-------+
| Field      | Type          | Null | Key   | Default | Extra |
+------------+---------------+------+-------+---------+-------+
| k          | bigint        | Yes  | true  | NULL    |       |
| v          | variant       | Yes  | false | NULL    | NONE  |
| v.nested.a | json          | Yes  | false | NULL    | NONE  |
| v.nested.b | json          | Yes  | false | NULL    | NONE  |
| v.nested.c | array<double> | Yes  | false | NULL    | NONE  |
+------------+---------------+------+-------+---------+-------+
5 rows in set (0.10 sec)

mysql> select * from var_nested_load_conflict;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[INTERNAL_ERROR]Meet none array column when flatten nested array, path nested.b, type Nullable(JSONB)

So we don't allow this kind of insertion

mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[DATA_QUALITY_ERROR][E46] Ambiguous paths: nested.b vs nested.b with different nested part false vs true

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jul 25, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33614 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43d587c2c97d4617dadea8a987f6030405dd8ae0, data reload: false

------ Round 1 ----------------------------------
q1	17620	5244	5096	5096
q2	1904	273	170	170
q3	10341	1274	693	693
q4	10205	1006	519	519
q5	7530	2364	2286	2286
q6	174	157	127	127
q7	872	724	599	599
q8	9290	1257	1027	1027
q9	6763	5031	5035	5031
q10	6885	2347	1944	1944
q11	469	287	291	287
q12	338	345	217	217
q13	17764	3683	3135	3135
q14	226	234	203	203
q15	553	479	487	479
q16	419	429	383	383
q17	569	848	341	341
q18	7450	7111	7101	7101
q19	1397	927	528	528
q20	334	334	221	221
q21	3582	3119	2261	2261
q22	1047	1022	966	966
Total cold run time: 105732 ms
Total hot run time: 33614 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5168	5141	5130	5130
q2	243	313	214	214
q3	2142	2657	2319	2319
q4	1384	1772	1411	1411
q5	4185	4365	4448	4365
q6	207	166	126	126
q7	1967	1988	1773	1773
q8	2637	2596	2631	2596
q9	7224	7177	7255	7177
q10	3157	3291	2863	2863
q11	577	519	503	503
q12	747	836	629	629
q13	3819	4041	3434	3434
q14	301	313	279	279
q15	528	476	503	476
q16	463	487	445	445
q17	1160	1590	1410	1410
q18	7990	7837	7776	7776
q19	788	768	792	768
q20	2038	2061	1840	1840
q21	4734	4390	4277	4277
q22	1094	1047	956	956
Total cold run time: 52553 ms
Total hot run time: 50767 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186665 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 43d587c2c97d4617dadea8a987f6030405dd8ae0, data reload: false

query1	988	383	415	383
query2	6558	1725	1744	1725
query3	6751	230	216	216
query4	26505	23276	23142	23142
query5	4340	615	515	515
query6	312	242	205	205
query7	4635	505	289	289
query8	288	239	222	222
query9	8584	2911	2851	2851
query10	463	329	303	303
query11	16014	15198	14760	14760
query12	151	116	107	107
query13	1639	505	414	414
query14	9046	5819	5836	5819
query15	210	187	169	169
query16	7224	634	472	472
query17	1170	735	568	568
query18	1974	408	313	313
query19	186	182	155	155
query20	128	120	112	112
query21	211	119	103	103
query22	4178	4077	4056	4056
query23	33651	32800	32824	32800
query24	8097	2407	2364	2364
query25	546	494	418	418
query26	1241	268	163	163
query27	2735	496	348	348
query28	4339	2217	2197	2197
query29	793	562	457	457
query30	290	228	196	196
query31	907	796	686	686
query32	85	76	76	76
query33	552	367	343	343
query34	783	843	513	513
query35	814	818	727	727
query36	981	1032	914	914
query37	119	109	93	93
query38	4251	4104	4019	4019
query39	1528	1429	1421	1421
query40	229	129	120	120
query41	86	64	59	59
query42	125	114	114	114
query43	532	487	484	484
query44	1393	871	866	866
query45	178	170	171	170
query46	842	988	648	648
query47	1761	1802	1709	1709
query48	391	445	325	325
query49	752	498	419	419
query50	644	698	420	420
query51	5358	5572	5384	5384
query52	114	111	106	106
query53	233	277	198	198
query54	612	622	545	545
query55	93	133	93	93
query56	324	327	308	308
query57	1179	1194	1111	1111
query58	287	273	280	273
query59	2552	2613	2560	2560
query60	348	344	327	327
query61	132	121	127	121
query62	810	727	665	665
query63	223	188	193	188
query64	4375	997	685	685
query65	4262	4165	4158	4158
query66	1156	407	349	349
query67	15598	15757	15535	15535
query68	8051	910	581	581
query69	484	322	289	289
query70	1250	1173	1120	1120
query71	460	337	318	318
query72	5527	4762	4696	4696
query73	735	578	358	358
query74	8893	9032	8737	8737
query75	3796	3086	2641	2641
query76	3602	1127	771	771
query77	791	382	329	329
query78	9530	9677	8925	8925
query79	2180	843	593	593
query80	616	549	485	485
query81	481	258	227	227
query82	423	142	110	110
query83	257	253	234	234
query84	244	99	89	89
query85	832	369	325	325
query86	345	353	309	309
query87	4388	4440	4297	4297
query88	3155	2281	2270	2270
query89	390	312	295	295
query90	1941	237	225	225
query91	137	148	114	114
query92	92	76	66	66
query93	1133	958	653	653
query94	687	403	302	302
query95	398	323	324	323
query96	493	593	284	284
query97	2679	2723	2647	2647
query98	239	222	217	217
query99	1602	1386	1261	1261
Total cold run time: 273687 ms
Total hot run time: 186665 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 43d587c2c97d4617dadea8a987f6030405dd8ae0, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.04
query3	0.24	0.07	0.07
query4	1.63	0.11	0.11
query5	0.44	0.42	0.40
query6	1.16	0.65	0.67
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.59	0.52	0.53
query10	0.58	0.57	0.58
query11	0.16	0.11	0.11
query12	0.15	0.11	0.12
query13	0.62	0.61	0.62
query14	0.79	0.83	0.83
query15	0.91	0.85	0.88
query16	0.39	0.39	0.38
query17	1.07	1.05	1.03
query18	0.21	0.19	0.19
query19	1.95	1.81	1.85
query20	0.01	0.01	0.02
query21	15.38	0.96	0.53
query22	0.80	1.18	0.69
query23	14.91	1.37	0.65
query24	6.28	2.66	0.41
query25	0.34	0.22	0.07
query26	0.52	0.16	0.13
query27	0.05	0.06	0.06
query28	9.85	0.97	0.45
query29	12.57	3.99	3.28
query30	3.11	3.04	2.95
query31	2.82	0.61	0.39
query32	3.23	0.57	0.49
query33	3.19	3.02	3.17
query34	15.96	5.52	4.81
query35	4.92	4.89	4.92
query36	0.69	0.51	0.49
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.2 s
Total hot run time: 32.03 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.65% (15966/27694)
Line Coverage 46.42% (143534/309218)
Region Coverage 35.83% (108098/301718)
Branch Coverage 38.36% (47721/124405)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.04% (22034/27188)
Line Coverage 73.67% (227502/308794)
Region Coverage 61.22% (189465/309470)
Branch Coverage 65.09% (81890/125813)

@amorynan
Copy link
Contributor Author

run buildall

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only check when enable_nested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b3f80dc214765402530ad5ecd8d74c3cdffc390, data reload: false

------ Round 1 ----------------------------------
q1	17590	5383	5233	5233
q2	1946	282	189	189
q3	10334	1469	692	692
q4	10257	957	521	521
q5	8323	2290	2315	2290
q6	181	165	131	131
q7	864	763	610	610
q8	9293	1233	1043	1043
q9	7065	5096	5134	5096
q10	6917	2359	1982	1982
q11	472	278	266	266
q12	339	361	209	209
q13	17800	3480	3053	3053
q14	232	233	225	225
q15	528	464	467	464
q16	436	439	381	381
q17	572	830	356	356
q18	7344	7164	7101	7101
q19	2491	978	526	526
q20	675	325	224	224
q21	3408	2404	2244	2244
q22	1038	1050	980	980
Total cold run time: 108105 ms
Total hot run time: 33816 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5480	5210	5306	5210
q2	597	303	211	211
q3	2150	2567	2158	2158
q4	1316	1700	1315	1315
q5	4135	4432	4413	4413
q6	215	195	136	136
q7	1998	1900	1686	1686
q8	2638	2648	2481	2481
q9	7244	7249	7219	7219
q10	3188	3357	2876	2876
q11	586	565	511	511
q12	750	813	589	589
q13	3519	3810	3408	3408
q14	300	311	288	288
q15	504	470	457	457
q16	435	541	457	457
q17	1148	1444	1314	1314
q18	7745	7843	7875	7843
q19	6845	854	856	854
q20	2965	1976	1774	1774
q21	7322	4354	4240	4240
q22	1084	1025	958	958
Total cold run time: 62164 ms
Total hot run time: 50398 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170919 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b3f80dc214765402530ad5ecd8d74c3cdffc390, data reload: false

============================================
query1	1001	376	403	376
query2	6520	1801	1672	1672
query3	6752	234	227	227
query4	26802	23549	22843	22843
query5	4366	641	530	530
query6	319	236	239	236
query7	4634	517	299	299
query8	281	233	236	233
query9	8601	2957	2966	2957
query10	452	362	300	300
query11	15664	14977	14760	14760
query12	182	137	129	129
query13	1664	555	412	412
query14	8628	5929	6043	5929
query15	217	205	179	179
query16	7645	625	431	431
query17	1478	791	653	653
query18	2079	470	334	334
query19	246	221	196	196
query20	159	163	143	143
query21	217	127	110	110
query22	4071	4036	3845	3845
query23	34255	33987	34108	33987
query24	7689	2439	2455	2439
query25	523	506	435	435
query26	714	316	164	164
query27	2331	513	345	345
query28	2988	2272	2278	2272
query29	626	609	489	489
query30	285	225	199	199
query31	882	795	715	715
query32	89	80	77	77
query33	552	399	385	385
query34	809	862	532	532
query35	803	845	740	740
query36	1036	1053	929	929
query37	134	110	87	87
query38	3996	4079	3963	3963
query39	1445	1372	1358	1358
query40	250	147	134	134
query41	60	59	53	53
query42	140	123	129	123
query43	506	495	474	474
query44	1398	891	877	877
query45	207	193	183	183
query46	950	1061	670	670
query47	1783	1820	1755	1755
query48	417	423	318	318
query49	681	495	412	412
query50	680	687	425	425
query51	5524	5507	5442	5442
query52	129	139	128	128
query53	263	294	219	219
query54	668	654	570	570
query55	95	95	90	90
query56	364	357	378	357
query57	1202	1216	1132	1132
query58	333	325	342	325
query59	2686	2622	2442	2442
query60	388	406	389	389
query61	124	121	120	120
query62	741	744	656	656
query63	253	219	217	217
query64	2369	1080	803	803
query65	4244	4076	4120	4076
query66	874	506	347	347
query67	query68	18562	640	614	614
query69	1016	329	308	308
query70	1419	1124	1097	1097
query71	721	356	334	334
query72	9213	2260	2352	2260
query73	3781	641	360	360
query74	8962	8844	8808	8808
query75	7742	3139	2697	2697
query76	8845	1210	768	768
query77	1173	433	345	345
query78	9617	10219	9311	9311
query79	14799	661	602	602
query80	2215	569	546	546
query81	574	261	230	230
query82	518	155	118	118
query83	340	295	268	268
query84	309	103	82	82
query85	862	385	334	334
query86	359	342	299	299
query87	4363	4293	4163	4163
query88	5397	2310	2321	2310
query89	482	370	323	323
query90	2502	241	228	228
query91	151	138	112	112
query92	89	72	67	67
query93	6527	969	661	661
query94	1378	401	286	286
query95	432	329	326	326
query96	514	601	288	288
query97	2747	2759	2612	2612
query98	250	249	228	228
query99	1486	1440	1287	1287
Total cold run time: 305809 ms
Total hot run time: 170919 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.68% (15994/27729)
Line Coverage 46.40% (143775/309839)
Region Coverage 35.82% (108459/302774)
Branch Coverage 38.36% (47855/124750)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.20% (22103/27220)
Line Coverage 73.84% (228479/309426)
Region Coverage 61.52% (191026/310523)
Branch Coverage 65.27% (82347/126158)

@amorynan
Copy link
Contributor Author

run performance

1 similar comment
@amorynan
Copy link
Contributor Author

run performance

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 28, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon merged commit eaab105 into apache:master Jul 29, 2025
27 of 30 checks passed
amorynan pushed a commit to amorynan/doris that referenced this pull request Jul 29, 2025
…apache#53923)

fix batch insert into with structure-conflicts strings
Before this we were able to successfully insert, but will meet query
error
like this:
```
mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
Query OK, 2 rows affected (0.16 sec)
{'label':'label_9279242ae3fd40e2_aabe077db2d37bb9', 'status':'VISIBLE', 'txnId':'16028'}

mysql> desc var_nested_load_conflict;
+------------+---------------+------+-------+---------+-------+
| Field      | Type          | Null | Key   | Default | Extra |
+------------+---------------+------+-------+---------+-------+
| k          | bigint        | Yes  | true  | NULL    |       |
| v          | variant       | Yes  | false | NULL    | NONE  |
| v.nested.a | json          | Yes  | false | NULL    | NONE  |
| v.nested.b | json          | Yes  | false | NULL    | NONE  |
| v.nested.c | array<double> | Yes  | false | NULL    | NONE  |
+------------+---------------+------+-------+---------+-------+
5 rows in set (0.10 sec)

mysql> select * from var_nested_load_conflict;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[INTERNAL_ERROR]Meet none array column when flatten nested array, path nested.b, type Nullable(JSONB)
```

So we don't allow this kind of insertion
```
mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[DATA_QUALITY_ERROR][E46] Ambiguous paths: nested.b vs nested.b with different nested part false vs true
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments