Partitioning should not overwrite schema#1752
Conversation
Flow PHP - BenchmarksResults of the benchmarks from this PR are compared with the results from 1.x branch. Extractors+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.846mb -0.00% | 437.282ms -0.62% | ±0.11% -67.95% |
| ExcelExtractorBench | bench_extract_10k_ods | 1 | 3 | 65.541mb -0.00% | 1.063s -0.05% | ±1.89% +91.16% |
| ExcelExtractorBench | bench_extract_10k_xlsx | 1 | 3 | 67.587mb -0.00% | 1.680s -0.41% | ±0.77% +15.89% |
| JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.438mb -0.03% | 1.162s -1.66% | ±0.56% -71.35% |
| ParquetExtractorBench | bench_extract_10k | 1 | 3 | 86.399mb -0.00% | 901.605ms +0.07% | ±1.12% +266.17% |
| TextExtractorBench | bench_extract_10k | 1 | 3 | 4.569mb -0.00% | 42.004ms -0.95% | ±0.10% -91.25% |
| XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.554mb -0.00% | 602.358ms -1.21% | ±0.90% -23.90% |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 18.565mb -0.00% | 73.624ms +0.66% | ±1.23% +47.98% |
| RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 123.303mb -0.00% | 67.322ms +0.54% | ±0.90% +600.18% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench | bench_load_10k | 1 | 3 | 62.506mb +0.00% | 88.919ms +0.28% | ±0.43% -76.11% |
| JsonLoaderBench | bench_load_10k | 1 | 3 | 80.587mb +0.00% | 102.855ms -0.73% | ±0.52% +31.25% |
| ParquetLoaderBench | bench_load_10k | 1 | 3 | 166.299mb +0.00% | 2.012s +0.29% | ±0.30% -52.69% |
| TextLoaderBench | bench_load_10k | 1 | 3 | 17.871mb +0.00% | 30.673ms -2.85% | ±0.41% -55.45% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| TypeDetectorBench | bench_type_detector | 1 | 3 | 42.515mb +0.00% | 409.041ms +0.09% | ±0.85% +85.20% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 11.572mb +0.00% | 80.910ms -1.81% | ±0.58% -68.92% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 105.984mb +0.00% | 649.393ms +0.19% | ±0.49% -56.10% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 55.259mb +0.00% | 326.852ms +0.50% | ±0.63% -51.88% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 14.845mb +0.00% | 69.885ms -1.07% | ±0.50% -33.37% |
| RowsBench | bench_chunk_10_on_10k | 2 | 3 | 93.455mb +0.00% | 3.500ms -2.81% | ±0.41% -80.01% |
| RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 110.826mb +0.00% | 243.040ms +1.55% | ±0.99% +165.58% |
| RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 93.546mb +0.00% | 24.463ms -0.57% | ±0.59% +53.73% |
| RowsBench | bench_drop_1k_on_10k | 2 | 3 | 94.330mb +0.00% | 1.520ms -12.19% | ±3.11% +1.97% |
| RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 94.330mb +0.00% | 1.442ms -19.49% | ±1.74% -51.98% |
| RowsBench | bench_entries_on_10k | 2 | 3 | 92.490mb +0.00% | 3.661ms -3.44% | ±1.99% -33.42% |
| RowsBench | bench_filter_on_10k | 2 | 3 | 93.019mb +0.00% | 16.177ms +1.09% | ±1.44% +39.55% |
| RowsBench | bench_find_on_10k | 2 | 3 | 93.019mb +0.00% | 15.615ms +0.85% | ±1.27% -20.16% |
| RowsBench | bench_find_one_on_10k | 10 | 3 | 91.708mb +0.00% | 1.994μs -0.30% | ±2.40% +0.00% |
| RowsBench | bench_first_on_10k | 10 | 3 | 91.708mb +0.00% | 0.400μs 0.00% | ±0.00% 0.00% |
| RowsBench | bench_flat_map_on_1k | 2 | 3 | 100.769mb +0.00% | 14.851ms -0.71% | ±2.32% -14.53% |
| RowsBench | bench_map_on_10k | 2 | 3 | 130.196mb +0.00% | 68.055ms -1.07% | ±1.08% +218.24% |
| RowsBench | bench_merge_1k_on_10k | 2 | 3 | 93.539mb +0.00% | 1.436ms -3.34% | ±0.68% -74.03% |
| RowsBench | bench_partition_by_on_10k | 2 | 3 | 96.908mb +0.00% | 64.933ms +2.10% | ±1.31% +50.33% |
| RowsBench | bench_remove_on_10k | 2 | 3 | 94.592mb +0.00% | 3.589ms -1.48% | ±2.61% +58.07% |
| RowsBench | bench_sort_asc_on_1k | 2 | 3 | 92.070mb +0.00% | 40.427ms -0.13% | ±1.36% -51.19% |
| RowsBench | bench_sort_by_on_1k | 2 | 3 | 92.070mb +0.00% | 39.577ms -5.26% | ±2.31% +65.13% |
| RowsBench | bench_sort_desc_on_1k | 2 | 3 | 92.070mb +0.00% | 40.167ms -0.80% | ±1.10% +213.14% |
| RowsBench | bench_sort_entries_on_1k | 2 | 3 | 94.151mb +0.00% | 8.227ms -3.01% | ±2.62% +227.16% |
| RowsBench | bench_sort_on_1k | 2 | 3 | 91.901mb +0.00% | 29.975ms -1.25% | ±1.81% +74.24% |
| RowsBench | bench_take_1k_on_10k | 10 | 3 | 91.708mb +0.00% | 14.442μs -3.01% | ±2.26% -26.62% |
| RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 91.708mb +0.00% | 16.462μs -4.53% | ±2.30% +33.34% |
| RowsBench | bench_unique_on_1k | 2 | 3 | 110.827mb +0.00% | 241.318ms -1.44% | ±1.27% -12.67% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 1.x #1752 +/- ##
=======================================
Coverage 81.30% 81.30%
=======================================
Files 717 717
Lines 19949 19948 -1
=======================================
Hits 16219 16219
+ Misses 3730 3729 -1
🚀 New features to boost your workflow:
|
| { | ||
| $this->setupFiles([__FUNCTION__ => []]); | ||
|
|
||
| $fs = new FakeNativeLocalFilesystem(); |
There was a problem hiding this comment.
could this be replaced with src/lib/filesystem/src/Flow/Filesystem/Local/MemoryFilesystem.php maybe? I'm not sure how I feel about Fake Filesystem in ETL package
There was a problem hiding this comment.
Memory doesn't support mv() method, so it will unfortunately not help :(
There was a problem hiding this comment.
We could create a dummy to log action calls, if that copy is a problem.
There was a problem hiding this comment.
yeah, I tried before merging, ended up moving this double to Filesystem package
| if ($this->saveMode === SaveMode::Overwrite) { | ||
| if ($fileStream->path()->partitions()->count()) { | ||
| $partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->path() . '/*', $fileStream->path()->options()); | ||
| $partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->uri() . '/*', $fileStream->path()->options()); |
Resolves: #1750
Added new (in fact, it's a copy of the existing filesystem stream) filesystem with a custom scheme to confirm that the protocol is lost when partitioning is done.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security