[feat](streaming job) Introduce streaming job for incremental load#56175
[feat](streaming job) Introduce streaming job for incremental load#56175JNSimba merged 28 commits intoapache:masterfrom
Conversation
…5790) ### What problem does this PR solve? introduce streaming job schedule task
### What problem does this PR solve? 1. Add Create StreamingJob and Alter Job 2. add job and task tvf schema 3. add offset api
### What problem does this PR solve? Introduce streaming task scheduler to schedule all streaming tasks.
…pache#55862) ### What problem does this PR solve? 1. add StreamingInsertTask For StreamJob 2. Improve StreamInsertJob 3. add insertcommand rewrite tvf params
…#55918) ### What problem does this PR solve? Implement offset persistence and replay logic(shared noting mode).
### What problem does this PR solve? 1. add S3 Stream job split offset 2. fix stream job create bug
### What problem does this PR solve? Fix streaming job problem
### What problem does this PR solve? Add fetch meta and fix rewrite tvf problem
…d mode (apache#55975) ### What problem does this PR solve? Implement offset persistence and replay in cloud mode.
…che#56056) ### What problem does this PR solve? Register listener id when begin transaction to ensure before/after commit logic would be executed.
### What problem does this PR solve? Add create job case and fix job bug
…re exactly-once semantics (apache#56135) ### What problem does this PR solve? Add task commit check and job event lock to ensure exactly-once semantics.
### What problem does this PR solve? Fix compile error
### What problem does this PR solve? Fix register callback id invalid.
…ay in cloud mode" (apache#56149) ### What problem does this PR solve? Revert "implement offset persistence and replay in cloud mode"
…and replay in cloud mode"" (apache#56156) Reverts apache#56149
### What problem does this PR solve? Fix Alter Job and schedule bug etc
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
1 similar comment
|
run buildall |
975d1da to
bed4094
Compare
|
run buildall |
bed4094 to
7377e32
Compare
|
run buildall |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces streaming job functionality for incremental load operations in Apache Doris. It implements a new type of job that can continuously consume data from external sources (like S3) and incrementally load it into Doris tables.
Key changes include:
- Added streaming job execution type and associated infrastructure
- Implemented S3-based offset provider for tracking incremental data consumption
- Created new protobuf definitions for streaming job metadata and transaction attachments
- Added ALTER JOB command for modifying streaming job properties
Reviewed Changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/proto/cloud.proto | Added protobuf definitions for streaming job metadata and transaction attachments |
| fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/ | Core streaming job implementation including job, task, and properties classes |
| fe/fe-core/src/main/java/org/apache/doris/job/offset/ | Offset provider framework for tracking data consumption progress |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/ | Command classes for creating and altering streaming jobs |
| fe/fe-core/src/main/java/org/apache/doris/fs/ | File system extensions for batch listing with limits |
| cloud/src/meta-service/ | Cloud mode metadata service extensions for streaming job progress |
| regression-test/suites/job_p0/streaming_job/ | Integration test for streaming insert job functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java
Show resolved
Hide resolved
...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertTask.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java
Outdated
Show resolved
Hide resolved
### What problem does this PR solve? improve job api
35398e1 to
698fd51
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run performance |
TPC-DS: Total hot run time: 2766 ms |
ClickBench: Total hot run time: 0.07 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud p0 |
|
run external |
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
closed #56191
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)