Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • Custom transformations into to_branch() loader
  • Transformations into to_transformation() loader

Fixed

Changed

Removed

Deprecated

Security


Two cool additions to scenarios where we need to split our dataset based on a condition. These changes let us apply some extra transformations right before we write the data to our destination storage.

Description

@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.772mb +0.01%  | 559.224ms +1.36% | ±0.84% +169.65% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.841mb +0.01%  | 1.054s -1.18%    | ±1.05% +43.33%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.490mb +0.00% | 898.264ms +0.71% | ±0.03% -89.75%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.502mb +0.01%  | 36.007ms -0.69%  | ±0.72% -11.89%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.479mb +0.01%  | 606.670ms +0.44% | ±0.14% -80.98%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 127.301mb +0.00% | 71.568ms +2.03% | ±1.39% +98.68% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 63.904mb +0.00%  | 101.913ms -3.63% | ±0.61% -74.02%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 84.313mb +0.00%  | 96.801ms -0.53%  | ±0.57% +141.35% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 161.207mb +0.00% | 20.494s -0.37%   | ±0.22% -23.79%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.968mb +0.00%  | 30.575ms -1.69%  | ±0.67% +76.03%  |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.939mb +0.00% | 456.047ms -0.35% | ±0.63% +5.65%    |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.129mb +0.00%  | 228.063ms -3.33% | ±0.15% -88.11%   |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.651mb +0.00%  | 50.412ms +0.63%  | ±0.64% -10.90%   |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 96.987mb +0.00%  | 3.194ms +0.93%   | ±1.96% +220.15%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 114.269mb +0.00% | 191.779ms +0.23% | ±0.23% -51.04%   |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 96.989mb +0.00%  | 19.438ms +0.93%  | ±1.01% -8.09%    |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 97.862mb +0.00%  | 1.481ms +1.17%   | ±3.06% +136.52%  |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 97.862mb +0.00%  | 1.493ms +2.72%   | ±0.26% -72.62%   |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 96.023mb +0.00%  | 4.424ms +1.82%   | ±2.85% +89.84%   |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 96.552mb +0.00%  | 16.516ms +1.72%  | ±0.97% +27.59%   |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 96.552mb +0.00%  | 16.517ms +1.47%  | ±0.48% -43.45%   |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 95.243mb +0.00%  | 1.800μs -4.96%   | ±0.00% -100.00%  |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 95.243mb +0.00%  | 0.400μs +33.33%  | ±0.00% +0.00%    |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 104.461mb +0.00% | 14.575ms +2.42%  | ±0.83% -57.37%   |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 134.528mb +0.00% | 73.267ms +2.51%  | ±0.23% -87.60%   |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 97.072mb +0.00%  | 1.422ms +9.19%   | ±2.35% +4.02%    |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 100.369mb +0.00% | 65.756ms +2.43%  | ±0.59% +68.10%   |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 98.124mb +0.00%  | 3.675ms +1.82%   | ±0.28% -78.39%   |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 95.531mb +0.00%  | 44.123ms +3.43%  | ±0.95% +101.32%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 95.531mb +0.00%  | 43.585ms +3.89%  | ±0.36% -70.92%   |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 95.531mb +0.00%  | 44.646ms +5.97%  | ±0.21% -21.75%   |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 97.683mb +0.00%  | 8.394ms +0.15%   | ±1.59% +53.51%   |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 95.433mb +0.00%  | 29.518ms +1.92%  | ±0.60% -17.16%   |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 95.243mb +0.00%  | 13.570μs +2.19%  | ±1.75% +87.58%   |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 95.243mb +0.00%  | 15.421μs +1.68%  | ±2.91% +116.04%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 114.270mb +0.00% | 194.921ms +0.83% | ±2.62% +58.37%   |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 43.777mb +0.00%  | 367.540ms +1.80% | ±1.15% +1188.47% |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.587mb +0.00%  | 73.584ms +1.82%  | ±1.77% -28.50%   |
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+

@codecov
Copy link

codecov bot commented Jan 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.55%. Comparing base (f012687) to head (7ddd174).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@           Coverage Diff           @@
##              1.x    #1408   +/-   ##
=======================================
  Coverage   82.54%   82.55%           
=======================================
  Files         654      654           
  Lines       17564    17576   +12     
=======================================
+ Hits        14498    14509   +11     
- Misses       3066     3067    +1     
Components Coverage Δ
etl 85.83% <100.00%> (+0.02%) ⬆️
cli 85.17% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 97.36% <ø> (ø)
lib-filesystem 76.23% <ø> (ø)
lib-parquet 84.57% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-rdsl 87.09% <ø> (ø)
lib-snappy 90.69% <ø> (-0.47%) ⬇️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 77.10% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.49% <ø> (ø)
adapter-doctrine 90.14% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 78.04% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 92.85% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 59.88% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)

@norberttech norberttech merged commit 841ad67 into flow-php:1.x Jan 26, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant