Skip to content

branch-3.1: [fix](paimon)Handle oversized CHAR/VARCHAR fields in Paimon to Doris type mapping #55051#55531

Merged
morrySnow merged 3 commits intoapache:branch-3.1from
vinlee19:handle_paimon_oversize_schema_3.1
Sep 4, 2025
Merged

branch-3.1: [fix](paimon)Handle oversized CHAR/VARCHAR fields in Paimon to Doris type mapping #55051#55531
morrySnow merged 3 commits intoapache:branch-3.1from
vinlee19:handle_paimon_oversize_schema_3.1

Conversation

@vinlee19
Copy link
Contributor

@vinlee19 vinlee19 commented Sep 1, 2025

bp #55051

Petrichor added 2 commits September 1, 2025 15:26
…type mapping (apache#55051)

### What problem does this PR solve?
In PR apache#49623, we implemented
conversion from Paimon `VARCHAR/CHAR` types to Doris `VARCHAR/CHAR`
types. However, there are significant differences in the maximum length
constraints between these systems:

**Apache Paimon:**
- `CHAR` : Fixed-length character string declared using CHAR(n) where n
is the number of code points. n must have a value between `1` and
`2,147,483,647` (inclusive). Defaults to n=1 if no length is specified.
- `VARCHAR`: Variable-length character string declared using VARCHAR(n)
where n is the maximum number of code points. n must have a value
between `1` and `2,147,483,647` (inclusive). Defaults to n=1 if no
length is specified.

**Apache Doris:**
- `CHAR `: Maximum length is `255` characters
- `VARCHAR` : Maximum length is `65,533` characters

**Solution:**
This PR addresses the length constraint mismatch by automatically
converting oversized Paimon VARCHAR/CHAR types to Doris STRING type when
they exceed Doris limits:
- Paimon `VARCHAR` with length > 65,533 → Doris `STRING`
- Paimon `CHAR` with length > 255 → Doris `STRING`

This ensures compatibility while preserving data integrity during type
mapping from Paimon to Doris.

(cherry picked from commit 6622f50)
@vinlee19 vinlee19 requested a review from morrySnow as a code owner September 1, 2025 07:58
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@vinlee19
Copy link
Contributor Author

vinlee19 commented Sep 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32657 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8d4ed79c4ffd6e4da3d607374138e1ed95ee28d2, data reload: false

------ Round 1 ----------------------------------
q1	17583	5539	5430	5430
q2	2061	289	155	155
q3	10429	1270	717	717
q4	10200	869	449	449
q5	7653	2385	2153	2153
q6	182	161	134	134
q7	893	738	619	619
q8	9314	1421	1143	1143
q9	5417	4901	4931	4901
q10	6751	2283	1826	1826
q11	461	281	263	263
q12	332	357	220	220
q13	17775	3630	3037	3037
q14	226	222	214	214
q15	536	465	463	463
q16	406	428	380	380
q17	598	860	361	361
q18	6852	6489	6436	6436
q19	1215	954	568	568
q20	335	348	213	213
q21	2783	2199	1999	1999
q22	1048	1017	976	976
Total cold run time: 103050 ms
Total hot run time: 32657 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5564	5518	5527	5518
q2	254	331	226	226
q3	2223	2639	2332	2332
q4	1332	1799	1359	1359
q5	4447	4868	5056	4868
q6	169	163	130	130
q7	2098	1935	1763	1763
q8	2585	2786	2730	2730
q9	7308	7138	7131	7131
q10	3029	3285	2797	2797
q11	583	498	477	477
q12	651	773	592	592
q13	3345	3766	3240	3240
q14	297	301	268	268
q15	530	484	471	471
q16	458	487	451	451
q17	1261	1754	1260	1260
q18	7823	7224	7240	7224
q19	782	1176	1029	1029
q20	1984	2051	1859	1859
q21	5261	4898	4540	4540
q22	1104	1058	1061	1058
Total cold run time: 53088 ms
Total hot run time: 51323 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190847 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8d4ed79c4ffd6e4da3d607374138e1ed95ee28d2, data reload: false

query1	1011	383	372	372
query2	6538	2055	1916	1916
query3	6705	219	225	219
query4	33680	23741	24183	23741
query5	4332	637	479	479
query6	268	198	191	191
query7	4618	497	318	318
query8	287	252	232	232
query9	9726	2626	2599	2599
query10	477	342	268	268
query11	18066	15433	15323	15323
query12	161	110	106	106
query13	1656	553	420	420
query14	9738	6850	7382	6850
query15	208	203	182	182
query16	7807	616	488	488
query17	1457	740	589	589
query18	1997	414	326	326
query19	206	193	166	166
query20	127	121	123	121
query21	209	129	113	113
query22	4497	4601	4464	4464
query23	35352	33932	33364	33364
query24	7551	2669	2679	2669
query25	536	493	422	422
query26	1214	289	174	174
query27	2543	455	340	340
query28	5527	2181	2157	2157
query29	836	589	468	468
query30	255	191	158	158
query31	986	898	845	845
query32	89	62	59	59
query33	546	374	312	312
query34	752	822	522	522
query35	776	816	736	736
query36	991	1059	971	971
query37	106	96	70	70
query38	3936	3968	3937	3937
query39	1486	1472	1450	1450
query40	209	116	110	110
query41	54	55	57	55
query42	116	106	108	106
query43	524	523	494	494
query44	1360	817	821	817
query45	190	177	171	171
query46	879	1035	678	678
query47	1867	1923	1856	1856
query48	428	433	366	366
query49	806	509	420	420
query50	675	682	436	436
query51	7254	7362	7123	7123
query52	105	108	100	100
query53	230	256	188	188
query54	567	566	485	485
query55	81	82	82	82
query56	291	278	270	270
query57	1214	1278	1185	1185
query58	243	219	228	219
query59	3024	3067	2969	2969
query60	306	292	272	272
query61	140	135	139	135
query62	814	756	700	700
query63	225	192	190	190
query64	4579	974	654	654
query65	3302	3242	3266	3242
query66	1062	413	313	313
query67	16051	15601	15761	15601
query68	8236	856	573	573
query69	487	301	265	265
query70	1177	1135	1088	1088
query71	505	308	271	271
query72	5605	3685	2572	2572
query73	642	765	363	363
query74	10246	9220	9374	9220
query75	3173	3184	2650	2650
query76	3135	1186	788	788
query77	494	419	285	285
query78	10334	10525	9684	9684
query79	3120	937	611	611
query80	721	523	455	455
query81	530	261	222	222
query82	653	120	92	92
query83	168	170	148	148
query84	241	102	86	86
query85	795	376	301	301
query86	413	305	303	303
query87	4323	4298	4308	4298
query88	5169	2445	2415	2415
query89	408	342	296	296
query90	1825	190	193	190
query91	136	144	113	113
query92	69	58	53	53
query93	1940	907	555	555
query94	708	421	311	311
query95	348	279	268	268
query96	515	624	286	286
query97	3222	3333	3184	3184
query98	223	211	202	202
query99	1612	1417	1325	1325
Total cold run time: 295022 ms
Total hot run time: 190847 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.94 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8d4ed79c4ffd6e4da3d607374138e1ed95ee28d2, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.02
query3	0.23	0.07	0.06
query4	1.62	0.11	0.11
query5	0.52	0.52	0.50
query6	1.14	0.74	0.73
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.57	0.51	0.52
query10	0.57	0.55	0.56
query11	0.14	0.10	0.10
query12	0.14	0.12	0.11
query13	0.61	0.61	0.59
query14	0.77	0.81	0.79
query15	0.84	0.84	0.85
query16	0.39	0.39	0.38
query17	1.05	1.05	1.01
query18	0.24	0.23	0.23
query19	1.86	1.91	1.90
query20	0.02	0.01	0.02
query21	15.39	0.92	0.57
query22	0.75	0.72	0.71
query23	15.11	1.39	0.52
query24	3.26	1.72	0.96
query25	0.16	0.11	0.05
query26	0.37	0.15	0.14
query27	0.05	0.04	0.04
query28	13.22	0.96	0.44
query29	12.58	3.87	3.19
query30	0.27	0.08	0.06
query31	2.82	0.59	0.39
query32	3.22	0.53	0.47
query33	3.05	3.04	3.00
query34	16.80	5.24	4.54
query35	4.57	4.55	4.58
query36	0.65	0.51	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.62 s
Total hot run time: 28.94 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 50.00% (4/8) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

vinlee19 commented Sep 1, 2025

run external

@vinlee19
Copy link
Contributor Author

vinlee19 commented Sep 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32360 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 291ec18c396fb08d24336357c74031b2d8704ff7, data reload: false

------ Round 1 ----------------------------------
q1	17586	5565	5375	5375
q2	2025	398	295	295
q3	12418	1239	729	729
q4	10544	857	445	445
q5	9372	2352	2119	2119
q6	181	159	131	131
q7	892	729	612	612
q8	9319	1386	1115	1115
q9	5312	4904	4888	4888
q10	6776	2259	1843	1843
q11	473	276	268	268
q12	336	352	215	215
q13	17788	3588	3010	3010
q14	244	232	210	210
q15	526	481	454	454
q16	432	437	385	385
q17	585	847	353	353
q18	6909	6254	6427	6254
q19	1375	935	542	542
q20	321	333	200	200
q21	2737	2150	1929	1929
q22	1051	988	1002	988
Total cold run time: 107202 ms
Total hot run time: 32360 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5549	5537	5550	5537
q2	233	317	234	234
q3	2206	2642	2338	2338
q4	1352	1759	1375	1375
q5	4363	4953	4967	4953
q6	187	163	130	130
q7	2052	2001	1816	1816
q8	2613	2772	2697	2697
q9	7241	7203	7212	7203
q10	3032	3267	2743	2743
q11	582	528	492	492
q12	696	764	604	604
q13	3398	3781	3180	3180
q14	289	310	276	276
q15	519	473	461	461
q16	446	507	442	442
q17	1237	1734	1263	1263
q18	7498	7504	7326	7326
q19	842	1171	1078	1078
q20	2031	2055	1896	1896
q21	5276	4917	4582	4582
q22	1091	1034	1056	1034
Total cold run time: 52733 ms
Total hot run time: 51660 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192529 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 291ec18c396fb08d24336357c74031b2d8704ff7, data reload: false

query1	970	404	399	399
query2	6220	2057	2014	2014
query3	8697	202	203	202
query4	33235	23809	23461	23461
query5	3656	588	432	432
query6	291	205	172	172
query7	4201	487	315	315
query8	290	236	228	228
query9	9456	2598	2593	2593
query10	466	321	258	258
query11	17905	15778	15334	15334
query12	162	111	106	106
query13	1565	546	429	429
query14	10023	7085	6814	6814
query15	241	195	182	182
query16	8084	691	449	449
query17	1539	751	585	585
query18	2176	423	332	332
query19	217	190	174	174
query20	127	131	122	122
query21	206	124	107	107
query22	4607	4589	4500	4500
query23	34805	33973	33918	33918
query24	7866	2686	2740	2686
query25	540	507	411	411
query26	1265	287	175	175
query27	2209	497	384	384
query28	5331	2218	2228	2218
query29	814	608	461	461
query30	244	197	166	166
query31	1044	937	835	835
query32	107	58	60	58
query33	495	361	330	330
query34	746	886	517	517
query35	787	835	723	723
query36	1029	1079	971	971
query37	102	96	68	68
query38	3988	4087	3997	3997
query39	1532	1502	1462	1462
query40	212	120	101	101
query41	50	48	47	47
query42	119	112	112	112
query43	517	527	507	507
query44	1364	835	839	835
query45	187	182	176	176
query46	888	1051	670	670
query47	1971	1988	1926	1926
query48	416	426	338	338
query49	781	495	403	403
query50	694	687	431	431
query51	7310	7302	7387	7302
query52	99	100	91	91
query53	232	260	190	190
query54	561	570	489	489
query55	77	79	81	79
query56	258	278	267	267
query57	1288	1290	1228	1228
query58	237	214	234	214
query59	3093	3225	3082	3082
query60	284	280	263	263
query61	111	110	128	110
query62	818	762	716	716
query63	228	198	194	194
query64	4648	985	646	646
query65	3389	3283	3317	3283
query66	1069	416	304	304
query67	16566	15818	15621	15621
query68	7337	828	552	552
query69	494	304	266	266
query70	1216	1128	1044	1044
query71	388	293	261	261
query72	5725	3740	3821	3740
query73	641	739	357	357
query74	10246	9444	9000	9000
query75	3144	3151	2678	2678
query76	3066	1166	773	773
query77	501	362	283	283
query78	10356	10421	9702	9702
query79	3810	844	598	598
query80	737	535	449	449
query81	516	253	217	217
query82	579	123	91	91
query83	163	165	146	146
query84	251	105	85	85
query85	802	353	300	300
query86	403	319	296	296
query87	4310	4380	4329	4329
query88	5207	2439	2420	2420
query89	413	328	300	300
query90	1778	196	192	192
query91	133	139	112	112
query92	74	57	54	54
query93	2605	999	558	558
query94	682	414	301	301
query95	358	283	279	279
query96	506	628	281	281
query97	3220	3278	3161	3161
query98	229	203	199	199
query99	1346	1428	1292	1292
Total cold run time: 295941 ms
Total hot run time: 192529 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.49 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 291ec18c396fb08d24336357c74031b2d8704ff7, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.53	0.53	0.51
query6	1.14	0.73	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.02
query9	0.56	0.50	0.52
query10	0.56	0.55	0.54
query11	0.15	0.10	0.10
query12	0.14	0.10	0.11
query13	0.60	0.62	0.61
query14	0.75	0.81	0.80
query15	0.84	0.83	0.84
query16	0.40	0.39	0.41
query17	1.05	1.02	1.04
query18	0.25	0.23	0.22
query19	1.88	1.78	1.82
query20	0.01	0.01	0.02
query21	15.42	0.92	0.56
query22	0.73	0.74	0.62
query23	15.19	1.39	0.51
query24	3.34	1.60	0.64
query25	0.21	0.16	0.07
query26	0.32	0.15	0.14
query27	0.05	0.05	0.05
query28	13.68	1.03	0.45
query29	12.58	3.86	3.23
query30	0.25	0.10	0.06
query31	2.81	0.60	0.38
query32	3.22	0.55	0.47
query33	3.00	3.03	3.04
query34	16.64	5.19	4.58
query35	4.59	4.53	4.57
query36	0.65	0.48	0.48
query37	0.09	0.06	0.05
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 104.03 s
Total hot run time: 28.49 s

@morrySnow morrySnow merged commit 37be2ae into apache:branch-3.1 Sep 4, 2025
22 checks passed
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments