-
-
Notifications
You must be signed in to change notification settings - Fork 48
1900 proposal add batchby to group related records in batches #1901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1900 proposal add batchby to group related records in batches #1901
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 1.x #1901 +/- ##
==========================================
- Coverage 77.66% 77.30% -0.37%
==========================================
Files 824 828 +4
Lines 25229 24587 -642
==========================================
- Hits 19595 19007 -588
+ Misses 5634 5580 -54
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces the batchBy() feature to group related records in batches based on a column value, ensuring data integrity when processing hierarchical data. It also includes a refactoring that renames ChunkExtractor to BatchExtractor for better naming consistency.
Key changes:
- Added
DataFrame::batchBy()method to batch rows by column value while maintaining referential integrity - Introduced
SortedByConstraintto validate data is sorted according to specified columns - Refactored
ChunkExtractortoBatchExtractorand deprecatedchunks_from()in favor ofbatches()
Reviewed Changes
Copilot reviewed 29 out of 31 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/core/etl/src/Flow/ETL/DataFrame.php |
Added batchBy() method to DataFrame API |
src/core/etl/src/Flow/ETL/Extractor/BatchByExtractor.php |
New extractor implementation for batching by column value |
src/core/etl/src/Flow/ETL/Pipeline/BatchingByPipeline.php |
New pipeline implementation for batch-by operations |
src/core/etl/src/Flow/ETL/Transformation/BatchBy.php |
New transformation for batch-by functionality |
src/core/etl/src/Flow/ETL/Constraint/SortedByConstraint.php |
New constraint to validate sorted data order |
src/core/etl/src/Flow/ETL/DSL/functions.php |
Added DSL functions: batched_by(), batches(), constraint_sorted_by() |
src/core/etl/src/Flow/ETL/Extractor/BatchExtractor.php |
Renamed from ChunkExtractor for consistency |
src/core/etl/tests/Flow/ETL/Tests/Integration/DataFrame/BatchByTest.php |
Integration tests for batchBy functionality |
src/core/etl/tests/Flow/ETL/Tests/Unit/Constraint/SortedConstraintTest.php |
Unit tests for SortedByConstraint |
documentation/components/core/batch-processing.md |
Documentation for the new batchBy feature |
examples/topics/data_frame/batch_by/ |
Example code demonstrating batchBy usage |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| public function test_columns_count_to_php_output_stream() : void | ||
| { | ||
| $loader = to_output(false, Output::column_count); | ||
|
|
||
| ob_start(); | ||
| \ob_start(); |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The change from ob_start() to \\ob_start() is inconsistent with how other global functions are used in the codebase. Either use fully qualified names consistently throughout the file, or use the function directly without the leading backslash. The same applies to ob_get_contents() and ob_end_clean().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
Resolves: #1900
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security