Skip to content

branch-3.0 cherry-pick [fix](variant) fix the reading core caused by inserting nested column and scalar column in variant sub-column#53897

Merged
dataroaring merged 8 commits intoapache:branch-3.0from
amorynan:pick_53083_to_doris_branch-3.0
Aug 12, 2025
Merged

branch-3.0 cherry-pick [fix](variant) fix the reading core caused by inserting nested column and scalar column in variant sub-column#53897
dataroaring merged 8 commits intoapache:branch-3.0from
amorynan:pick_53083_to_doris_branch-3.0

Conversation

@amorynan
Copy link
Contributor

What problem does this PR solve?

backport: #53083
Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… and scalar column in variant sub-column (apache#53083)

this pr main fix the problem which if we create table with
```variant_enable_flatten_nested```
then insert variant data:
```'{"nested":{"a":"1"}}``` and ```'{"nested":[{"a":1,"c":1.1},{"b":"1"}]}'```
we will meet core for reading
so we should forbid this table property
and for old data we insert different structure data will meet some error like this:
```
mysql> insert into vs values (2, '{"nested":{"a":"1"}}');
Query OK, 1 row affected (0.22 sec)
{'label':'label_165a8209698c4391_988c6532615017c4', 'status':'VISIBLE',
'txnId':'1011'}

mysql> insert into vs values (1,
'{"nested":[{"a":1,"c":1.1},{"b":"1"}]}');
ERROR 1105 (HY000): errCode = 2, detailMessage =
(10.16.10.6)[INTERNAL_ERROR]tablet 1752145213719 failed on majority
backends: [DATA_QUALITY_ERROR]PStatus:
(10.16.10.6)[DATA_QUALITY_ERROR]Ambiguous paths: v.nested.a vs
v.nested.a with different nested part true vs false
```
@amorynan amorynan requested a review from dataroaring as a code owner July 25, 2025 06:53
@Thearas
Copy link
Contributor

Thearas commented Jul 25, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@amorynan
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.01% (19953/30690)
Region Coverage 66.58% (10061/15112)
Branch Coverage 56.19% (5308/9446)

amory added 2 commits July 29, 2025 15:58
…apache#53923)

fix batch insert into with structure-conflicts strings
Before this we were able to successfully insert, but will meet query
error
like this:
```
mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
Query OK, 2 rows affected (0.16 sec)
{'label':'label_9279242ae3fd40e2_aabe077db2d37bb9', 'status':'VISIBLE', 'txnId':'16028'}

mysql> desc var_nested_load_conflict;
+------------+---------------+------+-------+---------+-------+
| Field      | Type          | Null | Key   | Default | Extra |
+------------+---------------+------+-------+---------+-------+
| k          | bigint        | Yes  | true  | NULL    |       |
| v          | variant       | Yes  | false | NULL    | NONE  |
| v.nested.a | json          | Yes  | false | NULL    | NONE  |
| v.nested.b | json          | Yes  | false | NULL    | NONE  |
| v.nested.c | array<double> | Yes  | false | NULL    | NONE  |
+------------+---------------+------+-------+---------+-------+
5 rows in set (0.10 sec)

mysql> select * from var_nested_load_conflict;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[INTERNAL_ERROR]Meet none array column when flatten nested array, path nested.b, type Nullable(JSONB)
```

So we don't allow this kind of insertion
```
mysql> insert into var_nested_load_conflict values (3, '{"nested": [{"a": 2.5, "b": "123.1"}]}'),  (4, '{"nested": {"a": 2.5, "b": "123.1"}}');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[DATA_QUALITY_ERROR][E46] Ambiguous paths: nested.b vs nested.b with different nested part false vs true
```
@amorynan
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.00% (19948/30690)
Region Coverage 66.56% (10059/15112)
Branch Coverage 56.17% (5306/9446)

@doris-robot
Copy link

TPC-H: Total hot run time: 39587 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea10940351f265ec82a0bbf130c9a78e0ac50820, data reload: false

------ Round 1 ----------------------------------
q1	17981	7309	6642	6642
q2	2055	159	156	156
q3	10649	1143	1199	1143
q4	10259	728	657	657
q5	7749	2879	2833	2833
q6	212	130	130	130
q7	1000	621	604	604
q8	9352	1924	2041	1924
q9	6712	6394	6386	6386
q10	7027	2222	2300	2222
q11	469	262	248	248
q12	406	215	211	211
q13	17798	2952	2968	2952
q14	228	205	202	202
q15	515	472	448	448
q16	481	380	373	373
q17	980	552	598	552
q18	7330	6649	6537	6537
q19	1418	1031	1054	1031
q20	482	197	196	196
q21	4436	3210	3165	3165
q22	1116	994	975	975
Total cold run time: 108655 ms
Total hot run time: 39587 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6747	6530	6547	6530
q2	329	229	230	229
q3	2929	2930	2920	2920
q4	2058	1872	1807	1807
q5	5717	5698	5759	5698
q6	207	132	130	130
q7	2264	1813	1824	1813
q8	3356	3584	3502	3502
q9	8792	8959	8802	8802
q10	3547	3539	3503	3503
q11	595	517	490	490
q12	820	602	591	591
q13	8266	3138	3161	3138
q14	307	270	265	265
q15	507	459	474	459
q16	483	431	442	431
q17	1856	1630	1593	1593
q18	8291	7797	7742	7742
q19	1672	1538	1560	1538
q20	2089	1881	1851	1851
q21	5291	5053	4955	4955
q22	1197	1070	1022	1022
Total cold run time: 67320 ms
Total hot run time: 59009 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 33.33% (2/6) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-DS: Total hot run time: 197409 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea10940351f265ec82a0bbf130c9a78e0ac50820, data reload: false

query1	1316	913	914	913
query2	6797	1914	1888	1888
query3	10925	4547	4584	4547
query4	32810	23728	23488	23488
query5	4096	460	449	449
query6	275	175	184	175
query7	3985	327	317	317
query8	290	225	212	212
query9	9455	2574	2556	2556
query10	475	266	252	252
query11	17883	15266	15064	15064
query12	153	105	103	103
query13	1554	438	423	423
query14	8924	7335	7334	7334
query15	250	196	193	193
query16	8046	556	470	470
query17	1632	587	630	587
query18	2162	312	312	312
query19	260	168	168	168
query20	136	116	118	116
query21	209	110	111	110
query22	4802	4397	4446	4397
query23	35414	34302	34147	34147
query24	11424	2962	2968	2962
query25	642	430	428	428
query26	1241	176	177	176
query27	2760	356	356	356
query28	7473	2162	2203	2162
query29	874	485	476	476
query30	263	156	162	156
query31	1056	847	834	834
query32	94	54	54	54
query33	757	307	300	300
query34	1019	510	521	510
query35	867	731	728	728
query36	1088	943	938	938
query37	127	66	66	66
query38	4130	3974	3995	3974
query39	1536	1504	1443	1443
query40	202	96	99	96
query41	48	49	49	49
query42	119	102	100	100
query43	533	475	473	473
query44	1270	831	841	831
query45	187	173	170	170
query46	1174	739	721	721
query47	1979	1881	1929	1881
query48	491	389	379	379
query49	915	410	418	410
query50	842	418	422	418
query51	7503	7142	7255	7142
query52	106	90	91	90
query53	255	184	180	180
query54	1149	469	484	469
query55	77	81	79	79
query56	270	247	242	242
query57	1306	1206	1234	1206
query58	235	211	213	211
query59	3192	2965	2985	2965
query60	290	261	262	261
query61	111	110	107	107
query62	886	700	686	686
query63	230	193	197	193
query64	4105	690	636	636
query65	3346	3273	3212	3212
query66	839	297	291	291
query67	15986	15528	15679	15528
query68	4604	586	558	558
query69	456	261	260	260
query70	1177	1115	1096	1096
query71	345	257	258	257
query72	6300	3939	3954	3939
query73	773	356	354	354
query74	10262	9200	9197	9197
query75	3345	2644	2685	2644
query76	2670	1159	1160	1159
query77	388	271	275	271
query78	10720	9532	9646	9532
query79	1790	589	587	587
query80	1188	423	415	415
query81	560	223	223	223
query82	919	87	87	87
query83	220	143	140	140
query84	230	79	75	75
query85	1293	296	282	282
query86	439	296	305	296
query87	4451	4204	4216	4204
query88	3758	2396	2351	2351
query89	414	285	291	285
query90	1959	183	181	181
query91	175	148	149	148
query92	65	52	52	52
query93	2522	553	560	553
query94	832	300	292	292
query95	363	253	251	251
query96	637	283	289	283
query97	3274	3140	3149	3140
query98	214	201	199	199
query99	1532	1288	1288	1288
Total cold run time: 302721 ms
Total hot run time: 197409 ms

@amorynan
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.04% (19961/30690)
Region Coverage 66.54% (10055/15112)
Branch Coverage 56.24% (5312/9446)

@doris-robot
Copy link

TPC-H: Total hot run time: 39985 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f8bcea4c0f2c7ea9808be4a5d73376453576927, data reload: false

------ Round 1 ----------------------------------
q1	17627	6815	6638	6638
q2	2069	190	160	160
q3	10523	1143	1158	1143
q4	10223	759	686	686
q5	7720	2927	2811	2811
q6	219	134	134	134
q7	966	622	602	602
q8	9347	1988	2012	1988
q9	6712	6393	6454	6393
q10	7001	2266	2288	2266
q11	450	259	265	259
q12	397	212	208	208
q13	17767	3016	3008	3008
q14	252	206	210	206
q15	514	468	487	468
q16	469	376	381	376
q17	972	602	567	567
q18	7374	6680	6688	6680
q19	1389	1063	1064	1063
q20	509	206	200	200
q21	4271	3131	3368	3131
q22	1109	1024	998	998
Total cold run time: 107880 ms
Total hot run time: 39985 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6592	6642	6574	6574
q2	333	235	228	228
q3	2926	2977	2890	2890
q4	2039	1795	1778	1778
q5	5687	5749	5556	5556
q6	210	129	131	129
q7	2209	1871	1793	1793
q8	3396	3493	3509	3493
q9	8866	8889	8893	8889
q10	3653	3577	3576	3576
q11	591	487	491	487
q12	807	595	621	595
q13	8678	3052	3019	3019
q14	287	275	263	263
q15	505	459	467	459
q16	487	451	438	438
q17	1855	1615	1627	1615
q18	8084	7832	7839	7832
q19	1693	1562	1626	1562
q20	2112	1901	1867	1867
q21	5363	5160	5109	5109
q22	1132	1049	991	991
Total cold run time: 67505 ms
Total hot run time: 59143 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 62.50% (75/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 42.06% (11181/26583)
Line Coverage 32.57% (95845/294239)
Region Coverage 31.72% (49542/156181)
Branch Coverage 28.17% (25415/90236)

@amorynan
Copy link
Contributor Author

run p0

@amorynan
Copy link
Contributor Author

run cloud_p0

@amorynan
Copy link
Contributor Author

run external

@amorynan
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.03% (19958/30690)
Region Coverage 66.60% (10064/15112)
Branch Coverage 56.18% (5307/9446)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 62.50% (75/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 42.06% (11182/26583)
Line Coverage 32.57% (95856/294284)
Region Coverage 31.71% (49540/156223)
Branch Coverage 28.17% (25421/90236)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 33.33% (2/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.33% (100/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.54% (18771/26237)
Line Coverage 63.94% (187925/293919)
Region Coverage 61.99% (112170/180935)
Branch Coverage 55.07% (56178/102020)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.33% (100/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.54% (18771/26237)
Line Coverage 63.93% (187911/293919)
Region Coverage 61.99% (112164/180935)
Branch Coverage 55.06% (56171/102020)

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.01% (19951/30690)
Region Coverage 66.58% (10062/15112)
Branch Coverage 56.19% (5308/9446)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 62.50% (75/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 42.07% (11184/26583)
Line Coverage 32.59% (95907/294284)
Region Coverage 31.72% (49555/156223)
Branch Coverage 28.17% (25421/90236)

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.98% (1171/1446)
Line Coverage 65.05% (19964/30690)
Region Coverage 66.63% (10069/15112)
Branch Coverage 56.28% (5316/9446)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 62.50% (75/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 42.07% (11184/26583)
Line Coverage 32.58% (95868/294284)
Region Coverage 31.72% (49548/156223)
Branch Coverage 28.18% (25424/90236)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 25.00% (2/8) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon changed the title [cherry-pick](variant) Pick 53083 to doris branch 3.0 branch-3.0 cherry-pick [fix](variant) fix the reading core caused by inserting nested column and scalar column in variant sub-column Aug 12, 2025

// disable variant flatten nested as session variable, default is true,
// which means disable variant flatten nested when create table
public static final String DISABLE_VARIANT_FLATTEN_NESTED = "disable_variant_flatten_nested";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to enable_xxx

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 12, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 8601538 into apache:branch-3.0 Aug 12, 2025
20 of 22 checks passed
@gavinchou gavinchou mentioned this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments