Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented Apr 28, 2025

Change Log

Added

Fixed

Changed

  • [GoogleSheet] Prevent fatal error when extracted columns don't match headers amount

Removed

Deprecated

Security


Description

In edge cases when the given spreadsheet has some random columns filled (i.e., as comments from the reporter), it will fail to match properly the headers with columns, leading to a fatal error:

array_combine(): Argument #1 ($keys) and argument #2 ($values) must have the same number of elements

@github-actions
Copy link
Contributor

github-actions bot commented Apr 28, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.882mb +0.02%  | 620.326ms +0.89% | ±0.97% -16.58%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.467mb +0.05%  | 1.336s -1.91%    | ±0.18% -54.45%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.479mb +0.00% | 928.598ms -1.39% | ±1.20% +226.00% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.606mb +0.02%  | 40.162ms +3.23%  | ±1.42% +18.69%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.581mb +0.02%  | 602.786ms -0.79% | ±0.90% +4.13%   |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 127.404mb +0.00% | 69.431ms -2.58% | ±1.36% -28.28% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 64.049mb +0.00%  | 104.088ms -0.18% | ±1.14% +33.20% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 84.081mb +0.00%  | 95.933ms -1.64%  | ±0.27% -67.49% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 166.563mb +0.00% | 20.722s -3.86%   | ±0.25% -69.07% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 18.139mb +0.00%  | 31.525ms +0.18%  | ±0.54% -27.53% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 97.066mb +0.00%  | 3.188ms -23.14%  | ±0.95% -70.24%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 114.422mb +0.00% | 180.561ms -2.51% | ±0.68% +2.01%   |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 97.142mb +0.00%  | 18.165ms -4.85%  | ±0.69% -18.88%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 97.941mb +0.00%  | 1.516ms -30.41%  | ±3.41% +323.45% |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 97.941mb +0.00%  | 1.513ms -31.48%  | ±2.49% +244.22% |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 96.101mb +0.00%  | 4.321ms -28.38%  | ±2.19% -1.17%   |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 96.630mb +0.00%  | 16.581ms -3.46%  | ±0.84% -16.01%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 96.630mb +0.00%  | 16.491ms -2.86%  | ±0.49% -24.32%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 95.322mb +0.00%  | 1.706μs -14.70%  | ±2.72% +0.00%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 95.322mb +0.00%  | 0.400μs -20.00%  | ±0.00% +0.00%   |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 104.540mb +0.00% | 15.026ms -9.51%  | ±0.33% -86.91%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 134.607mb +0.00% | 70.827ms -5.99%  | ±1.09% +48.75%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 97.150mb +0.00%  | 1.334ms -35.48%  | ±3.50% -5.73%   |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 100.522mb +0.00% | 63.310ms -0.77%  | ±1.04% +159.54% |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 98.203mb +0.00%  | 3.775ms -7.12%   | ±0.73% -66.94%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 95.684mb +0.00%  | 40.148ms -0.81%  | ±0.35% -88.82%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 95.685mb +0.00%  | 40.913ms -2.08%  | ±2.27% +259.68% |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 95.684mb +0.00%  | 41.189ms -1.33%  | ±0.10% -89.91%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 97.762mb +0.00%  | 8.384ms -1.28%   | ±0.85% -56.88%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 95.512mb +0.00%  | 30.024ms -1.51%  | ±2.76% +50.60%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 95.322mb +0.00%  | 14.562μs +1.10%  | ±2.61% -5.06%   |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 95.322mb +0.00%  | 16.796μs -6.12%  | ±3.65% +43.28%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 114.423mb +0.00% | 187.191ms -1.50% | ±0.76% +116.95% |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 44.047mb +0.00%  | 466.969ms -0.02% | ±0.92% -51.82%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.857mb +0.01%  | 94.023ms -0.33%  | ±0.78% -23.59%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.512mb +0.00% | 716.200ms -0.70% | ±0.44% +21.50%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.022mb +0.00%  | 370.531ms +2.14% | ±1.08% -38.16%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.800mb +0.01%  | 81.247ms +3.80%  | ±0.65% -26.44%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@stloyd stloyd force-pushed the google-drop-extra branch from 192ff1f to bb65616 Compare April 28, 2025 07:18
@github-actions github-actions bot added size: S and removed size: XS labels Apr 28, 2025
@codecov
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.21%. Comparing base (2cae774) to head (cb0c7c3).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1606      +/-   ##
==========================================
+ Coverage   83.20%   83.21%   +0.01%     
==========================================
  Files         703      703              
  Lines       19053    19060       +7     
==========================================
+ Hits        15853    15861       +8     
+ Misses       3200     3199       -1     
Components Coverage Δ
etl 86.25% <ø> (ø)
cli 84.59% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 90.11% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-parquet 84.36% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-snappy 91.16% <ø> (ø)
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.57% <ø> (ø)
adapter-doctrine 89.51% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 80.00% <91.66%> (+1.95%) ⬆️
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 80.85% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd force-pushed the google-drop-extra branch from bb65616 to 96f1d7f Compare April 28, 2025 11:31
@stloyd stloyd force-pushed the google-drop-extra branch from 96f1d7f to ab09964 Compare April 28, 2025 11:41
@stloyd stloyd changed the title [GoogleSheet] Add option to drop extra columns from extracted spreadsheets [GoogleSheet] Prevent fatal error when extracted columns don't match headers amount Apr 28, 2025
@stloyd stloyd requested a review from norberttech April 28, 2025 12:18
@norberttech norberttech merged commit 7814d70 into flow-php:1.x Apr 28, 2025
22 checks passed
@stloyd stloyd deleted the google-drop-extra branch April 28, 2025 13:02
@norberttech norberttech added this to the 0.15.0 milestone Apr 28, 2025
@norberttech norberttech moved this from Todo to Done in Roadmap May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants