-
-
Notifications
You must be signed in to change notification settings - Fork 48
Simplify Excel & GoogleSheet extractor cell expanding #1661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ) | ||
| ); | ||
| // Expand columns to the size of the previous row | ||
| for ($i = $rowDataCount; $i < $headersCount; $i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen in situation like this:
- row 1 - 5 columns
- row 2 - 5 columns
- row 3 - 7 columns
- row 4 - 5 columns
the final will be like below?
- row 1 - 5 columns
- row 2 - 5 columns
- row 3 - 7 columns
- row 4 - 7 columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think right now: yes.
This will be similar case for Google (& Excel) sheets, when we can enforce drop of later extra cells or hardly fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I would skip that, users can simply use df()->read()->select(...) to make sure that only columns they need are present int the output and anything extra is removed, or they can use schema infer where those additional columns are going to be nullable and then they can pass inferred schema to extractor
Flow PHP - BenchmarksResults of the benchmarks from this PR are compared with the results from 1.x branch. Extractors+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.775mb +0.00% | 605.105ms -0.12% | ±0.97% -60.78% |
| ExcelExtractorBench | bench_extract_10k | 1 | 3 | 75.288mb -0.00% | 1.513s -0.93% | ±0.58% -22.17% |
| JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.018mb +0.00% | 1.286s +0.78% | ±0.14% -30.83% |
| ParquetExtractorBench | bench_extract_10k | 1 | 3 | 86.321mb +0.00% | 923.223ms -0.37% | ±3.16% +778.62% |
| TextExtractorBench | bench_extract_10k | 1 | 3 | 4.499mb +0.01% | 38.415ms -1.65% | ±0.19% -69.78% |
| XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.494mb +0.01% | 603.395ms -0.97% | ±0.53% -48.66% |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 123.236mb +0.00% | 65.549ms +0.15% | ±0.47% -67.06% |
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 18.498mb +0.00% | 73.680ms +0.88% | ±1.17% +38.23% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench | bench_load_10k | 1 | 3 | 62.435mb +0.00% | 84.642ms -3.02% | ±1.61% +143.46% |
| JsonLoaderBench | bench_load_10k | 1 | 3 | 79.706mb +0.00% | 95.775ms -2.20% | ±0.58% +85.85% |
| ParquetLoaderBench | bench_load_10k | 1 | 3 | 165.387mb +0.00% | 20.819s -0.03% | ±0.21% -51.94% |
| TextLoaderBench | bench_load_10k | 1 | 3 | 17.805mb +0.00% | 31.012ms -2.30% | ±2.34% +359.16% |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 101.784mb +0.00% | 639.384ms -2.22% | ±0.39% -61.32% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 53.134mb +0.00% | 326.760ms +1.03% | ±0.40% -83.10% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 14.384mb +0.00% | 69.667ms -2.55% | ±1.45% +13.09% |
| RowsBench | bench_chunk_10_on_10k | 2 | 3 | 93.389mb +0.00% | 3.574ms -0.49% | ±3.10% -18.15% |
| RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 110.758mb +0.00% | 239.207ms +2.17% | ±0.50% +12.65% |
| RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 93.478mb +0.00% | 24.234ms +0.91% | ±0.34% +6.06% |
| RowsBench | bench_drop_1k_on_10k | 2 | 3 | 94.264mb +0.00% | 1.810ms +1.34% | ±3.52% +231.00% |
| RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 94.264mb +0.00% | 1.420ms -8.05% | ±3.18% +94.40% |
| RowsBench | bench_entries_on_10k | 2 | 3 | 92.424mb +0.00% | 3.498ms +1.50% | ±2.65% +170.88% |
| RowsBench | bench_filter_on_10k | 2 | 3 | 92.953mb +0.00% | 16.799ms +6.69% | ±0.46% -75.85% |
| RowsBench | bench_find_on_10k | 2 | 3 | 92.953mb +0.00% | 16.565ms +7.70% | ±0.98% -14.72% |
| RowsBench | bench_find_one_on_10k | 10 | 3 | 91.642mb +0.00% | 1.994μs +5.28% | ±2.40% -5.08% |
| RowsBench | bench_first_on_10k | 10 | 3 | 91.642mb +0.00% | 0.400μs 0.00% | ±0.00% 0.00% |
| RowsBench | bench_flat_map_on_1k | 2 | 3 | 100.703mb +0.00% | 14.439ms +2.11% | ±0.76% -43.10% |
| RowsBench | bench_map_on_10k | 2 | 3 | 130.130mb +0.00% | 68.735ms +4.40% | ±2.57% +515.65% |
| RowsBench | bench_merge_1k_on_10k | 2 | 3 | 93.473mb +0.00% | 1.370ms +22.99% | ±2.59% +44.03% |
| RowsBench | bench_partition_by_on_10k | 2 | 3 | 96.841mb +0.00% | 63.440ms +2.33% | ±1.17% -34.02% |
| RowsBench | bench_remove_on_10k | 2 | 3 | 94.526mb +0.00% | 3.640ms +3.93% | ±2.91% +202.08% |
| RowsBench | bench_sort_asc_on_1k | 2 | 3 | 92.003mb +0.00% | 40.354ms +1.10% | ±0.79% -27.56% |
| RowsBench | bench_sort_by_on_1k | 2 | 3 | 92.004mb +0.00% | 40.203ms -1.74% | ±0.45% -53.97% |
| RowsBench | bench_sort_desc_on_1k | 2 | 3 | 92.003mb +0.00% | 39.881ms -0.94% | ±2.46% +150.71% |
| RowsBench | bench_sort_entries_on_1k | 2 | 3 | 94.085mb +0.00% | 8.145ms -6.66% | ±1.16% +2.70% |
| RowsBench | bench_sort_on_1k | 2 | 3 | 91.835mb +0.00% | 29.572ms -2.69% | ±0.83% -60.48% |
| RowsBench | bench_take_1k_on_10k | 10 | 3 | 91.642mb +0.00% | 13.824μs -4.04% | ±1.35% +314.35% |
| RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 91.642mb +0.00% | 16.496μs -0.74% | ±2.48% -29.39% |
| RowsBench | bench_unique_on_1k | 2 | 3 | 110.759mb +0.00% | 245.072ms +3.02% | ±0.42% +2.19% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 42.070mb +0.00% | 423.795ms +0.70% | ±0.36% -65.54% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 11.448mb +0.00% | 85.622ms +0.36% | ±0.38% -56.14% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 1.x #1661 +/- ##
==========================================
+ Coverage 82.04% 82.06% +0.01%
==========================================
Files 703 703
Lines 19068 19053 -15
==========================================
- Hits 15645 15635 -10
+ Misses 3423 3418 -5
🚀 New features to boost your workflow:
|
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description