Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

Fixed

  • Improve performance of merching rows into given batch size

Changed

Removed

Deprecated

Security


Description

Based on #1044 issue I was able to identify one bottleneck and resolve it:

Before:

Processing: countries.tsv (115Mb) - batchSize: 5000 - limit: no-limit...
DB Entries count: 1 557 190
Total time: 224.21s

After:

Processing: countries.tsv (115Mb) - batchSize: 5000 - limit: no-limit...
DB Entries count: 1 557 190
Time: 182.09s

@github-actions
Copy link
Contributor

github-actions bot commented Apr 12, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.298mb +0.00%  | 851.559ms -0.99% | ±0.31% -80.97%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 5.018mb +0.02%   | 344.625ms -0.45% | ±0.40% +27.16%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.172mb +0.02%   | 1.084s +0.83%    | ±1.64% +80.42%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.839mb +0.00% | 919.967ms +0.30% | ±0.88% +176.10% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.927mb +0.01%   | 35.464ms -3.36%  | ±0.44% -77.51%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.933mb +0.01%   | 436.049ms -0.97% | ±0.41% -57.03%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 116.238mb +0.00% | 60.102ms -2.28% | ±0.86% -62.06% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 96.683mb +0.00%  | 461.678ms -0.52% | ±0.36% -0.24%   |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 55.162mb +0.00%  | 68.974ms -2.17%  | ±0.28% -84.89%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 107.592mb +0.00% | 52.268ms -0.02%  | ±2.15% +310.00% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 227.007mb +0.00% | 1.431s -0.13%    | ±0.60% +5.36%   |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.975mb +0.01%  | 38.903ms -3.18%  | ±0.69% -47.93%  |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 87.059mb +0.00%  | 3.411ms -8.61%   | ±2.30% -25.85%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.658mb +0.00% | 185.569ms -3.41% | ±0.59% +17.68%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.378mb +0.00%  | 18.544ms -7.33%  | ±0.53% -52.78%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.299mb +0.00%  | 1.730ms -6.79%   | ±1.07% -47.87%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.299mb +0.00%  | 1.676ms -14.79%  | ±1.68% -21.31%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.411mb +0.00%  | 2.751ms -1.50%   | ±1.45% +125.75% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.940mb +0.00%  | 17.158ms +1.50%  | ±1.31% +173.19% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.940mb +0.00%  | 17.840ms +5.33%  | ±3.55% +340.26% |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.845mb +0.00%  | 1.800μs -10.27%  | ±0.00% -100.00% |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.845mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 93.194mb +0.00%  | 12.563ms -7.10%  | ±0.78% -37.34%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.565mb +0.00% | 62.209ms -2.06%  | ±2.41% +322.01% |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.460mb +0.00%  | 1.252ms -19.81%  | ±0.92% +58.83%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.807mb +0.00%  | 65.340ms -4.20%  | ±1.49% +78.27%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.562mb +0.00%  | 3.840ms -9.21%   | ±0.48% -20.38%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.988mb +0.00%  | 39.478ms -4.04%  | ±1.16% +2.41%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.989mb +0.00%  | 39.890ms -2.43%  | ±0.38% -49.25%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.988mb +0.00%  | 40.615ms -3.27%  | ±0.67% -22.93%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.286mb +0.00%  | 7.394ms -1.23%   | ±2.59% +76.16%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.845mb +0.00%  | 28.868ms -3.76%  | ±0.86% -48.84%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.845mb +0.00%  | 14.021μs -2.03%  | ±0.89% -51.80%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.845mb +0.00%  | 15.858μs -6.03%  | ±1.84% +63.61%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.659mb +0.00% | 195.638ms -1.61% | ±0.72% +12.25%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.969mb +0.00%  | 432.870ms -0.76% | ±0.99% +12.40%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.509mb +0.00%  | 84.911ms -2.52%  | ±0.79% -17.34%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.737mb +0.00% | 515.561ms -2.06% | ±1.23% +7.94%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 60.215mb +0.00%  | 260.700ms +0.75% | ±0.36% -70.74%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 15.149mb +0.00%  | 55.000ms -2.03%  | ±1.61% +291.54% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@github-actions github-actions bot added size: S and removed size: XS labels Apr 12, 2024
@norberttech norberttech merged commit d87033d into flow-php:1.x Apr 12, 2024
@norberttech norberttech deleted the performance/batching-pipeline-memory-consumption branch May 9, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants