Skip to content

Comments

branch-4.0: [Improve](StreamingJob) add max_filter_ratio and strict mode for mysql/pg streaming job #60473#60527

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60473-branch-4.0
Feb 5, 2026
Merged

branch-4.0: [Improve](StreamingJob) add max_filter_ratio and strict mode for mysql/pg streaming job #60473#60527
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60473-branch-4.0

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Feb 5, 2026

Cherry-picked from #60473

…l/pg streaming job (#60473)

### What problem does this PR solve?

Related PR: #58898
#59461

In some scenarios, it is necessary to tolerate a certain amount of
erroneous data.

Supported parameters:

`load.strict_mode`: Whether to enable strict mode, defaults to false.

`load.max_filter_ratio`: The maximum allowed filtering rate within the
sampling window, defaults to zero tolerance. The sampling window is
`max_interval * 10`. That is, if the number of erroneous rows/total rows
exceeds `max_filter_ratio` within the sampling window, the job will be
paused, requiring manual intervention to check data quality issues.


eg:  
```
CREATE JOB test_streaming_mysql_job_errormsg
ON STREAMING
FROM MYSQL (
"jdbc_url" = "jdbc:mysql://127.0.0.1:3308",
......
)
TO DATABASE database (
"table.create.properties.replication_num" = "1"
...
"load.max_filter_ratio" = "1"
)
```
@github-actions github-actions bot requested a review from yiguolei as a code owner February 5, 2026 06:16
@Thearas
Copy link
Contributor

Thearas commented Feb 5, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Feb 5, 2026
@dataroaring dataroaring reopened this Feb 5, 2026
@Thearas
Copy link
Contributor

Thearas commented Feb 5, 2026

run buildall

@JNSimba
Copy link
Member

JNSimba commented Feb 5, 2026

run p0

@yiguolei yiguolei merged commit b31a957 into branch-4.0 Feb 5, 2026
26 of 29 checks passed
@github-actions github-actions bot deleted the auto-pick-60473-branch-4.0 branch February 5, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants