Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented May 19, 2025

Change Log

Added

Fixed

  • Close Excel extractor when limit is reached
  • Prevent memory leak in `ExcelExtractor` by not copying each row in loop

Changed

Removed

Deprecated

Security


Description

@stloyd stloyd requested a review from norberttech as a code owner May 19, 2025 11:14
@github-actions
Copy link
Contributor

github-actions bot commented May 19, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev           |
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.775mb +0.01%   | 613.289ms +2.38% | ±1.00% +851.18%  |
| ExcelExtractorBench   | bench_extract_10k | 1    | 3   | 75.287mb -10.30% | 1.681s +10.28%   | ±0.79% +136.19%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.018mb +0.00%   | 1.267s -1.25%    | ±0.87% -8.74%    |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.321mb +0.00%  | 951.580ms +3.89% | ±1.79% +1751.56% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.499mb +0.01%   | 38.202ms -0.73%  | ±2.02% +89.47%   |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.494mb +0.01%   | 590.771ms -3.99% | ±0.80% -35.66%   |
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.236mb +0.00% | 66.963ms +1.50% | ±1.67% +34.03% |
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.498mb +0.00%  | 73.458ms +3.00% | ±1.14% -14.84% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.435mb +0.00%  | 87.324ms +1.02% | ±0.87% -16.30%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 79.706mb +0.00%  | 97.769ms +0.24% | ±0.56% -66.37%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 165.387mb +0.00% | 20.665s -0.06%  | ±0.05% +195.07% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.805mb +0.00%  | 30.754ms -1.48% | ±0.38% -16.79%  |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 101.784mb +0.00% | 637.880ms -0.66% | ±0.26% -55.59%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 53.134mb +0.00%  | 323.156ms -2.40% | ±0.83% -9.10%   |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.384mb +0.00%  | 68.849ms -1.65%  | ±0.06% -93.98%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.389mb +0.00%  | 3.317ms -0.15%   | ±0.64% -62.32%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.758mb +0.00% | 236.200ms -1.65% | ±0.82% +283.33% |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.478mb +0.00%  | 23.653ms -0.96%  | ±0.28% -83.51%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.264mb +0.00%  | 1.776ms +31.15%  | ±2.45% -3.21%   |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.264mb +0.00%  | 1.604ms +14.02%  | ±3.21% +245.89% |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.424mb +0.00%  | 3.868ms +15.22%  | ±1.09% +410.59% |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 92.953mb +0.00%  | 16.014ms +1.03%  | ±1.11% -11.58%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 92.953mb +0.00%  | 15.632ms -0.75%  | ±0.41% -59.04%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.642mb +0.00%  | 2.000μs +4.93%   | ±0.00% -100.00% |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.642mb +0.00%  | 0.400μs +33.33%  | ±0.00% +0.00%   |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.703mb +0.00% | 14.652ms -0.11%  | ±1.45% +7.22%   |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.130mb +0.00% | 68.681ms +0.99%  | ±0.54% -64.79%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.473mb +0.00%  | 1.374ms +5.29%   | ±2.02% +53.81%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.841mb +0.00%  | 63.135ms +0.93%  | ±0.42% -49.46%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.526mb +0.00%  | 3.806ms +9.49%   | ±3.48% +646.90% |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.003mb +0.00%  | 40.367ms +0.51%  | ±0.06% -96.67%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.004mb +0.00%  | 39.539ms -2.41%  | ±1.64% +44.38%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.003mb +0.00%  | 39.456ms -2.94%  | ±1.49% -6.24%   |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.085mb +0.00%  | 8.161ms +0.06%   | ±0.63% -61.77%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.835mb +0.00%  | 29.459ms -1.85%  | ±0.94% +10.04%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.642mb +0.00%  | 16.080μs +15.63% | ±1.05% +210.75% |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.642mb +0.00%  | 17.388μs +11.61% | ±0.54% -31.82%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.759mb +0.00% | 240.047ms -1.05% | ±0.55% -9.73%   |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.070mb +0.00%  | 426.716ms -0.32% | ±0.35% -18.72%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.448mb +0.00%  | 85.149ms -0.91%  | ±0.80% -30.84%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@codecov
Copy link

codecov bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.07%. Comparing base (bd2f34b) to head (57f557f).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1659      +/-   ##
==========================================
+ Coverage   82.06%   82.07%   +0.01%     
==========================================
  Files         703      703              
  Lines       19053    19053              
==========================================
+ Hits        15635    15637       +2     
+ Misses       3418     3416       -2     
Components Coverage Δ
etl 88.27% <ø> (ø)
cli 84.42% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 90.11% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-parquet 84.37% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-snappy 90.69% <ø> (ø)
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.57% <ø> (ø)
adapter-doctrine 89.69% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 78.42% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd force-pushed the excel-extractor-not-closed branch from 8102527 to a55d7be Compare May 19, 2025 11:22
@stloyd stloyd force-pushed the excel-extractor-not-closed branch 3 times, most recently from c8e7680 to ec118cd Compare May 19, 2025 13:36
@stloyd stloyd requested a review from norberttech May 19, 2025 13:38
@stloyd stloyd force-pushed the excel-extractor-not-closed branch from ec118cd to d059e36 Compare May 20, 2025 15:06
@github-actions github-actions bot added size: S and removed size: XS labels May 20, 2025
@stloyd stloyd force-pushed the excel-extractor-not-closed branch from d059e36 to 83c9a0d Compare May 20, 2025 15:07
@stloyd stloyd requested a review from norberttech May 20, 2025 15:07
@stloyd stloyd requested a review from norberttech May 20, 2025 15:43
@norberttech norberttech merged commit 0a74411 into flow-php:1.x May 20, 2025
24 checks passed
@stloyd stloyd deleted the excel-extractor-not-closed branch May 20, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants