[Feature](Streaming Job) Extend streaming job to support Postgres synchronization#59461

JNSimba · 2025-12-29T10:45:23Z

What problem does this PR solve?

This Issues (#58896) and #58898 implements multi-table synchronization in MySQL, The main purpose of this PR is to extend the data source to Postgres.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2025-12-29T10:45:28Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

JNSimba · 2025-12-29T10:59:36Z

run buildall

JNSimba · 2025-12-29T11:21:03Z

run buildall

doris-robot · 2025-12-29T12:00:43Z

TPC-H: Total hot run time: 34820 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3aee2e9049eeff52ddf41da691c3c43904d76ae4, data reload: false

------ Round 1 ----------------------------------
q1	17588	4259	4075	4075
q2	2006	348	236	236
q3	10201	1271	734	734
q4	10224	863	318	318
q5	7534	2162	1874	1874
q6	211	167	134	134
q7	981	800	658	658
q8	9284	1394	1145	1145
q9	6747	5147	5143	5143
q10	6746	1782	1391	1391
q11	505	318	291	291
q12	666	729	590	590
q13	17811	3836	3030	3030
q14	285	296	273	273
q15	583	516	509	509
q16	680	674	627	627
q17	697	756	590	590
q18	7368	7253	8014	7253
q19	1257	994	612	612
q20	441	397	268	268
q21	4441	4201	4006	4006
q22	1138	1144	1063	1063
Total cold run time: 107394 ms
Total hot run time: 34820 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4242	4271	4289	4271
q2	355	404	326	326
q3	2255	2805	2495	2495
q4	1399	1852	1430	1430
q5	4432	4579	4186	4186
q6	219	163	124	124
q7	1916	1899	1745	1745
q8	2513	2431	2279	2279
q9	7271	7112	7097	7097
q10	2481	2526	2137	2137
q11	524	455	430	430
q12	650	681	575	575
q13	3315	3826	3078	3078
q14	263	285	254	254
q15	523	484	491	484
q16	603	645	609	609
q17	1078	1289	1326	1289
q18	7179	7227	7033	7033
q19	858	829	859	829
q20	1865	1975	1797	1797
q21	4556	4327	4088	4088
q22	1069	1061	994	994
Total cold run time: 49566 ms
Total hot run time: 47550 ms

doris-robot · 2025-12-29T12:11:33Z

TPC-DS: Total hot run time: 174319 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3aee2e9049eeff52ddf41da691c3c43904d76ae4, data reload: false

query5	4852	589	450	450
query6	335	236	220	220
query7	4217	460	271	271
query8	335	248	251	248
query9	8795	2624	2629	2624
query10	518	378	317	317
query11	15456	15102	15061	15061
query12	176	114	113	113
query13	1260	495	397	397
query14	6301	2942	2762	2762
query14_1	2637	2637	2628	2628
query15	202	189	171	171
query16	982	466	446	446
query17	1070	683	596	596
query18	2511	451	357	357
query19	234	220	200	200
query20	126	115	113	113
query21	216	137	122	122
query22	3854	4085	3955	3955
query23	15937	15566	15192	15192
query23_1	15333	15540	15183	15183
query24	7501	1599	1223	1223
query24_1	1215	1195	1260	1195
query25	567	474	431	431
query26	1242	269	163	163
query27	2762	452	309	309
query28	4535	2186	2167	2167
query29	785	527	426	426
query30	305	238	209	209
query31	800	628	584	584
query32	76	70	67	67
query33	541	335	273	273
query34	893	879	545	545
query35	742	775	705	705
query36	869	878	762	762
query37	123	90	76	76
query38	2714	2729	2645	2645
query39	782	757	739	739
query39_1	700	724	697	697
query40	216	128	113	113
query41	65	61	63	61
query42	102	101	99	99
query43	449	453	403	403
query44	1341	773	744	744
query45	182	182	178	178
query46	886	963	620	620
query47	1364	1398	1415	1398
query48	311	334	244	244
query49	593	436	329	329
query50	637	279	220	220
query51	3789	3753	3759	3753
query52	103	105	95	95
query53	292	331	272	272
query54	312	267	244	244
query55	74	76	71	71
query56	274	276	283	276
query57	1023	989	870	870
query58	272	250	245	245
query59	2068	2059	1985	1985
query60	321	314	289	289
query61	161	158	154	154
query62	400	356	322	322
query63	298	271	267	267
query64	5041	1292	1000	1000
query65	3743	3741	3684	3684
query66	1421	424	319	319
query67	15034	15691	14737	14737
query68	5153	1023	753	753
query69	494	343	313	313
query70	975	936	860	860
query71	358	298	281	281
query72	6193	4782	4825	4782
query73	681	570	303	303
query74	8820	8830	8506	8506
query75	2903	2894	2560	2560
query76	3898	1068	650	650
query77	508	365	279	279
query78	9685	9897	9155	9155
query79	1097	971	614	614
query80	683	584	484	484
query81	481	266	228	228
query82	207	143	113	113
query83	267	272	244	244
query84	254	120	105	105
query85	883	505	456	456
query86	339	292	318	292
query87	2884	2880	2807	2807
query88	3320	2311	2311	2311
query89	396	360	329	329
query90	1963	163	161	161
query91	180	172	147	147
query92	72	69	67	67
query93	1075	940	575	575
query94	574	318	291	291
query95	585	380	309	309
query96	607	478	219	219
query97	2345	2357	2268	2268
query98	214	196	198	196
query99	574	553	516	516
Total cold run time: 250780 ms
Total hot run time: 174319 ms

doris-robot · 2025-12-29T12:16:34Z

ClickBench: Total hot run time: 26.89 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3aee2e9049eeff52ddf41da691c3c43904d76ae4, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.04	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.28	0.25	0.28
query6	1.16	0.67	0.66
query7	0.04	0.02	0.03
query8	0.05	0.04	0.04
query9	0.57	0.51	0.49
query10	0.55	0.54	0.56
query11	0.15	0.11	0.11
query12	0.17	0.13	0.13
query13	0.62	0.60	0.60
query14	0.98	0.97	0.97
query15	0.81	0.79	0.78
query16	0.39	0.41	0.40
query17	1.06	0.99	1.04
query18	0.24	0.23	0.22
query19	1.99	1.90	1.80
query20	0.02	0.02	0.01
query21	15.44	0.29	0.14
query22	4.73	0.05	0.05
query23	16.05	0.29	0.10
query24	0.98	0.53	0.19
query25	0.12	0.08	0.06
query26	0.14	0.14	0.14
query27	0.07	0.05	0.08
query28	3.84	1.06	0.88
query29	12.56	3.89	3.19
query30	0.28	0.14	0.12
query31	2.82	0.62	0.38
query32	3.23	0.55	0.46
query33	2.99	3.00	3.06
query34	16.55	5.12	4.56
query35	4.51	4.47	4.45
query36	0.67	0.49	0.49
query37	0.11	0.07	0.07
query38	0.06	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.6 s
Total hot run time: 26.89 s

doris-robot · 2025-12-29T12:29:56Z

BE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.42% (18963/35500)
Line Coverage	39.29% (175898/447719)
Region Coverage	33.83% (136048/402166)
Branch Coverage	34.77% (58759/168979)

hello-stephen · 2025-12-29T12:32:44Z

FE UT Coverage Report

Increment line coverage 2.70% (2/74) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2025-12-29T14:02:53Z

BE Regression && UT Coverage Report

Increment line coverage 23.08% (3/13) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	72.23% (25066/34705)
Line Coverage	58.95% (263246/446536)
Region Coverage	53.82% (218688/406352)
Branch Coverage	55.32% (93791/169545)

hello-stephen · 2025-12-29T14:18:32Z

FE Regression Coverage Report

Increment line coverage 66.22% (49/74) 🎉
Increment coverage report
Complete coverage report

JNSimba · 2025-12-30T07:09:22Z

run buildall

hello-stephen · 2025-12-30T08:19:09Z

FE UT Coverage Report

Increment line coverage 2.47% (2/81) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2025-12-30T08:20:57Z

BE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.41% (18964/35508)
Line Coverage	39.29% (176028/448063)
Region Coverage	33.85% (136228/402437)
Branch Coverage	34.79% (58844/169138)

hello-stephen · 2025-12-31T09:55:08Z

BE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.38% (18955/35510)
Line Coverage	39.27% (175966/448085)
Region Coverage	33.85% (136218/402437)
Branch Coverage	34.77% (58802/169140)

JNSimba · 2025-12-31T10:38:00Z

run nonConcurrent

hello-stephen · 2025-12-31T12:46:46Z

BE Regression && UT Coverage Report

Increment line coverage 23.08% (3/13) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	72.21% (25067/34715)
Line Coverage	58.96% (263505/446902)
Region Coverage	53.73% (218497/406621)
Branch Coverage	55.32% (93887/169704)

liaoxin01

LGTM

github-actions · 2026-01-04T03:47:45Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-01-04T03:47:47Z

PR approved by anyone and no changes requested.

shuke987 · 2026-01-04T06:06:52Z

skip buildall

…chronization (#59461) ### What problem does this PR solve? This Issues (#58896) implements multi-table synchronization in MySQL, The main purpose of this PR is to extend the data source to Postgres.

…Postgres synchronization #59461 (#59530) Cherry-picked from #59461 Co-authored-by: wudi <wudi@selectdb.com>

### What problem does this PR solve? Related PR: #59461 1. PostgreSQL uses slots for data consumption, but only one client can use a slot at a time. Therefore, after consuming data from the WAL phase, the slot needs to be closed. This doesn't affect MySQL, but it can be closed to avoid consuming connections. 2. Create pg slot first when create job 3. fix unstable case

### What problem does this PR solve? Related PR: #58898 #59461 This PR primarily optimizes the speed of incremental and snapshot reads. 1. For incremental reads: - Binding the fetch logic to an interval allows fetching data within that interval. - Splitting the fetch and write logic asynchronously. 2. For snapshot reads: - Introducing the `snapshot_split_size` and `snapshot_parallelism` parameters. - `snapshot_split_size`: Adjusts the size of each chunk during the split phase, allowing each split to fetch more data. - `snapshot_parallelism`: The degree of parallelism during the snapshot read phase, i.e., how many chunks can run simultaneously, and how many chunks are scheduled in a single task.

…l/pg streaming job (#60473) ### What problem does this PR solve? Related PR: #58898 #59461 In some scenarios, it is necessary to tolerate a certain amount of erroneous data. Supported parameters: `load.strict_mode`: Whether to enable strict mode, defaults to false. `load.max_filter_ratio`: The maximum allowed filtering rate within the sampling window, defaults to zero tolerance. The sampling window is `max_interval * 10`. That is, if the number of erroneous rows/total rows exceeds `max_filter_ratio` within the sampling window, the job will be paused, requiring manual intervention to check data quality issues. eg: ``` CREATE JOB test_streaming_mysql_job_errormsg ON STREAMING FROM MYSQL ( "jdbc_url" = "jdbc:mysql://127.0.0.1:3308", ...... ) TO DATABASE database ( "table.create.properties.replication_num" = "1" ... "load.max_filter_ratio" = "1" ) ```

### What problem does this PR solve? Related PR: #59461 To enhance partition table synchronization, add `publish_via_partition_root` when creating a PUBLICATION instance, specifically for PG 13+.

…e#60560) ### What problem does this PR solve? Related PR: apache#59461 To enhance partition table synchronization, add `publish_via_partition_root` when creating a PUBLICATION instance, specifically for PG 13+.

JNSimba added 6 commits December 22, 2025 20:03

add pg step1

2ce6d24

add pg chunk data read

b511991

add pg incr consumer

c43404d

fix pg incr consumer

53f2832

fix offset and pg

2f9be93

add alltype case

d788021

JNSimba requested review from CalvinKirs and morningman as code owners December 29, 2025 10:45

JNSimba and others added 2 commits December 29, 2025 18:48

clang style fix

d048921

add license

dae08c7

JNSimba added the dev/4.0.x label Dec 29, 2025

improve ci

3aee2e9

JNSimba added 3 commits December 30, 2025 11:35

Merge branch 'master-new' into pgcdc

27844f5

fix ci and case

090d9b8

fix exclude case

168abce

JNSimba requested a review from Copilot December 30, 2025 08:28

Copilot started reviewing on behalf of JNSimba December 30, 2025 08:29 View session

liaoxin01 approved these changes Jan 4, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 4, 2026

github-actions bot added the reviewed label Jan 4, 2026

CalvinKirs approved these changes Jan 4, 2026

View reviewed changes

JNSimba merged commit 5f92656 into apache:master Jan 4, 2026
28 of 31 checks passed

github-actions bot mentioned this pull request Jan 4, 2026

branch-4.0: [Feature](Streaming Job) Extend streaming job to support Postgres synchronization #59461 #59530

Merged

yiguolei pushed a commit that referenced this pull request Jan 5, 2026

branch-4.0: [Feature](Streaming Job) Extend streaming job to support …

d81698c

…Postgres synchronization #59461 (#59530) Cherry-picked from #59461 Co-authored-by: wudi <wudi@selectdb.com>

yiguolei added dev/4.0.3-merged and removed dev/4.0.x labels Jan 5, 2026

JNSimba mentioned this pull request Jan 12, 2026

[Fix](StreamingJob) fix postgres consumer data in multi backend #59798

Merged

16 tasks

This was referenced Jan 21, 2026

[Feature](tvf) Support cdc stream tvf for mysql and pg #60116

Open

[Fix](StreamingJob) Optimize CDC consumption strategy #60181

Merged

JNSimba mentioned this pull request Feb 3, 2026

[Improve](StreamingJob) add max_filter_ratio and strict mode for mysql/pg streaming job #60473

Merged

16 tasks

JNSimba mentioned this pull request Feb 6, 2026

[Improve](streaming job) support postgres partition table sync #60560

Merged

16 tasks

Conversation

JNSimba commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Dec 29, 2025

Uh oh!

JNSimba commented Dec 29, 2025

Uh oh!

JNSimba commented Dec 29, 2025

Uh oh!

doris-robot commented Dec 29, 2025

Uh oh!

doris-robot commented Dec 29, 2025

Uh oh!

doris-robot commented Dec 29, 2025

Uh oh!

doris-robot commented Dec 29, 2025

BE UT Coverage Report

Uh oh!

hello-stephen commented Dec 29, 2025

FE UT Coverage Report

Uh oh!

hello-stephen commented Dec 29, 2025

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Dec 29, 2025

FE Regression Coverage Report

Uh oh!

JNSimba commented Dec 30, 2025

Uh oh!

hello-stephen commented Dec 30, 2025

FE UT Coverage Report

Uh oh!

hello-stephen commented Dec 30, 2025

BE UT Coverage Report

Uh oh!

hello-stephen commented Dec 31, 2025

BE UT Coverage Report

Uh oh!

JNSimba commented Dec 31, 2025

Uh oh!

hello-stephen commented Dec 31, 2025

BE Regression && UT Coverage Report

Uh oh!

liaoxin01 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

shuke987 commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

JNSimba commented Dec 29, 2025 •

edited

Loading