Skip to content

Conversation

@norberttech
Copy link
Member

Resolves: #1900

Change Log


Added

  • DataFrame::batchBy()
  • SortedByConstraint

Fixed

Changed

Removed

Deprecated

Security

@norberttech norberttech linked an issue Oct 23, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 77.17391% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.30%. Comparing base (63eae33) to head (02c4642).
⚠️ Report is 4 commits behind head on 1.x.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1901      +/-   ##
==========================================
- Coverage   77.66%   77.30%   -0.37%     
==========================================
  Files         824      828       +4     
  Lines       25229    24587     -642     
==========================================
- Hits        19595    19007     -588     
+ Misses       5634     5580      -54     
Components Coverage Δ
etl 88.83% <77.17%> (-0.47%) ⬇️
cli 85.96% <ø> (+0.04%) ⬆️
lib-array-dot 95.00% <ø> (+0.43%) ⬆️
lib-azure-sdk 60.39% <ø> (-0.97%) ⬇️
lib-doctrine-dbal-bulk 95.14% <ø> (-0.46%) ⬇️
lib-filesystem 80.08% <ø> (-0.17%) ⬇️
lib-types 53.50% <ø> (-0.06%) ⬇️
lib-parquet 68.34% <ø> (-0.41%) ⬇️
lib-parquet-viewer 83.04% <ø> (-0.08%) ⬇️
lib-snappy 90.18% <ø> (-0.05%) ⬇️
bridge-filesystem-async-aws 90.16% <ø> (-0.22%) ⬇️
bridge-filesystem-azure 89.47% <ø> (-0.45%) ⬇️
bridge-monolog-http 96.91% <ø> (-0.13%) ⬇️
bridge-openapi-specification 94.50% <ø> (-0.03%) ⬇️
symfony-http-foundation 73.17% <ø> (-1.25%) ⬇️
adapter-chartjs 86.36% <ø> (-0.35%) ⬇️
adapter-csv 88.46% <ø> (-0.40%) ⬇️
adapter-doctrine 90.97% <ø> (-0.25%) ⬇️
adapter-elasticsearch 97.17% <ø> (-0.07%) ⬇️
adapter-google-sheet 91.40% <ø> (-0.27%) ⬇️
adapter-http 58.82% <ø> (+0.71%) ⬆️
adapter-json 87.74% <ø> (-0.24%) ⬇️
adapter-logger 83.33% <ø> (-1.29%) ⬇️
adapter-meilisearch 97.87% <ø> (-0.09%) ⬇️
adapter-parquet 78.15% <ø> (-0.25%) ⬇️
adapter-text 82.92% <ø> (-1.52%) ⬇️
adapter-xml 82.50% <ø> (-0.36%) ⬇️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@norberttech norberttech requested a review from Copilot October 24, 2025 09:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces the batchBy() feature to group related records in batches based on a column value, ensuring data integrity when processing hierarchical data. It also includes a refactoring that renames ChunkExtractor to BatchExtractor for better naming consistency.

Key changes:

  • Added DataFrame::batchBy() method to batch rows by column value while maintaining referential integrity
  • Introduced SortedByConstraint to validate data is sorted according to specified columns
  • Refactored ChunkExtractor to BatchExtractor and deprecated chunks_from() in favor of batches()

Reviewed Changes

Copilot reviewed 29 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/core/etl/src/Flow/ETL/DataFrame.php Added batchBy() method to DataFrame API
src/core/etl/src/Flow/ETL/Extractor/BatchByExtractor.php New extractor implementation for batching by column value
src/core/etl/src/Flow/ETL/Pipeline/BatchingByPipeline.php New pipeline implementation for batch-by operations
src/core/etl/src/Flow/ETL/Transformation/BatchBy.php New transformation for batch-by functionality
src/core/etl/src/Flow/ETL/Constraint/SortedByConstraint.php New constraint to validate sorted data order
src/core/etl/src/Flow/ETL/DSL/functions.php Added DSL functions: batched_by(), batches(), constraint_sorted_by()
src/core/etl/src/Flow/ETL/Extractor/BatchExtractor.php Renamed from ChunkExtractor for consistency
src/core/etl/tests/Flow/ETL/Tests/Integration/DataFrame/BatchByTest.php Integration tests for batchBy functionality
src/core/etl/tests/Flow/ETL/Tests/Unit/Constraint/SortedConstraintTest.php Unit tests for SortedByConstraint
documentation/components/core/batch-processing.md Documentation for the new batchBy feature
examples/topics/data_frame/batch_by/ Example code demonstrating batchBy usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 19 to +23
public function test_columns_count_to_php_output_stream() : void
{
$loader = to_output(false, Output::column_count);

ob_start();
\ob_start();
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The change from ob_start() to \\ob_start() is inconsistent with how other global functions are used in the codebase. Either use fully qualified names consistently throughout the file, or use the function directly without the leading backslash. The same applies to ob_get_contents() and ob_end_clean().

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

@norberttech norberttech merged commit a31bf0c into 1.x Oct 24, 2025
16 of 18 checks passed
@norberttech norberttech deleted the 1900-proposal-add-batchby-to-group-related-records-in-batches branch October 24, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal]: Add batchBy() to group related records in batches

2 participants