Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented May 19, 2025

Change Log

Added

Fixed

Changed

  • Simplify Excel & GoogleSheet extractor cell expanding

Removed

Deprecated

Security


Description

)
);
// Expand columns to the size of the previous row
for ($i = $rowDataCount; $i < $headersCount; $i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen in situation like this:

  • row 1 - 5 columns
  • row 2 - 5 columns
  • row 3 - 7 columns
  • row 4 - 5 columns

the final will be like below?

  • row 1 - 5 columns
  • row 2 - 5 columns
  • row 3 - 7 columns
  • row 4 - 7 columns

Copy link
Member Author

@stloyd stloyd May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now: yes.

This will be similar case for Google (& Excel) sheets, when we can enforce drop of later extra cells or hardly fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would skip that, users can simply use df()->read()->select(...) to make sure that only columns they need are present int the output and anything extra is removed, or they can use schema infer where those additional columns are going to be nullable and then they can pass inferred schema to extractor

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.775mb +0.00%  | 605.105ms -0.12% | ±0.97% -60.78%  |
| ExcelExtractorBench   | bench_extract_10k | 1    | 3   | 75.288mb -0.00% | 1.513s -0.93%    | ±0.58% -22.17%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.018mb +0.00%  | 1.286s +0.78%    | ±0.14% -30.83%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.321mb +0.00% | 923.223ms -0.37% | ±3.16% +778.62% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.499mb +0.01%  | 38.415ms -1.65%  | ±0.19% -69.78%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.494mb +0.01%  | 603.395ms -0.97% | ±0.53% -48.66%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.236mb +0.00% | 65.549ms +0.15% | ±0.47% -67.06% |
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.498mb +0.00%  | 73.680ms +0.88% | ±1.17% +38.23% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.435mb +0.00%  | 84.642ms -3.02% | ±1.61% +143.46% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 79.706mb +0.00%  | 95.775ms -2.20% | ±0.58% +85.85%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 165.387mb +0.00% | 20.819s -0.03%  | ±0.21% -51.94%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.805mb +0.00%  | 31.012ms -2.30% | ±2.34% +359.16% |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 101.784mb +0.00% | 639.384ms -2.22% | ±0.39% -61.32%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 53.134mb +0.00%  | 326.760ms +1.03% | ±0.40% -83.10%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.384mb +0.00%  | 69.667ms -2.55%  | ±1.45% +13.09%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.389mb +0.00%  | 3.574ms -0.49%   | ±3.10% -18.15%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.758mb +0.00% | 239.207ms +2.17% | ±0.50% +12.65%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.478mb +0.00%  | 24.234ms +0.91%  | ±0.34% +6.06%   |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.264mb +0.00%  | 1.810ms +1.34%   | ±3.52% +231.00% |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.264mb +0.00%  | 1.420ms -8.05%   | ±3.18% +94.40%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.424mb +0.00%  | 3.498ms +1.50%   | ±2.65% +170.88% |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 92.953mb +0.00%  | 16.799ms +6.69%  | ±0.46% -75.85%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 92.953mb +0.00%  | 16.565ms +7.70%  | ±0.98% -14.72%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.642mb +0.00%  | 1.994μs +5.28%   | ±2.40% -5.08%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.642mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.703mb +0.00% | 14.439ms +2.11%  | ±0.76% -43.10%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.130mb +0.00% | 68.735ms +4.40%  | ±2.57% +515.65% |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.473mb +0.00%  | 1.370ms +22.99%  | ±2.59% +44.03%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.841mb +0.00%  | 63.440ms +2.33%  | ±1.17% -34.02%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.526mb +0.00%  | 3.640ms +3.93%   | ±2.91% +202.08% |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.003mb +0.00%  | 40.354ms +1.10%  | ±0.79% -27.56%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.004mb +0.00%  | 40.203ms -1.74%  | ±0.45% -53.97%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.003mb +0.00%  | 39.881ms -0.94%  | ±2.46% +150.71% |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.085mb +0.00%  | 8.145ms -6.66%   | ±1.16% +2.70%   |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.835mb +0.00%  | 29.572ms -2.69%  | ±0.83% -60.48%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.642mb +0.00%  | 13.824μs -4.04%  | ±1.35% +314.35% |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.642mb +0.00%  | 16.496μs -0.74%  | ±2.48% -29.39%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.759mb +0.00% | 245.072ms +3.02% | ±0.42% +2.19%   |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.070mb +0.00%  | 423.795ms +0.70% | ±0.36% -65.54%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.448mb +0.00%  | 85.622ms +0.36%  | ±0.38% -56.14%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@codecov
Copy link

codecov bot commented May 19, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.06%. Comparing base (b48198c) to head (10fc976).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1661      +/-   ##
==========================================
+ Coverage   82.04%   82.06%   +0.01%     
==========================================
  Files         703      703              
  Lines       19068    19053      -15     
==========================================
- Hits        15645    15635      -10     
+ Misses       3423     3418       -5     
Components Coverage Δ
etl 88.27% <ø> (ø)
cli 84.42% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 90.11% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-parquet 84.37% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-snappy 90.69% <ø> (-0.47%) ⬇️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.57% <ø> (ø)
adapter-doctrine 89.69% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 83.87% <50.00%> (+3.87%) ⬆️
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 78.42% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd requested a review from norberttech May 19, 2025 13:38
@norberttech norberttech enabled auto-merge (squash) May 19, 2025 13:40
@norberttech norberttech merged commit bd2f34b into flow-php:1.x May 19, 2025
21 checks passed
@stloyd stloyd deleted the simplify-row-expand branch May 19, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants