Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • validator to Parquet Writer

Fixed

Changed

Removed

Deprecated

Security


Description

Closes: #757

The main goal of this validator is to confirm that all required columns are set in row and that required columns are not getting null values.

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 34.745mb +0.00%  | 1.132s -1.12%    | ±0.93% -22.95%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.604mb +0.04%   | 303.457ms -1.55% | ±3.00% -6.45%   |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.769mb +0.04%   | 1.392s -0.95%    | ±1.99% +173.51% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 239.474mb +0.00% | 1.570s -0.81%    | ±1.36% +802.09% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.558mb +0.04%   | 24.300ms -2.86%  | ±1.69% +149.88% |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.558mb +0.04%   | 404.153ms +0.34% | ±0.30% +37.60%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.245mb +0.00% | 64.868ms +2.56% | ±0.85% -54.59% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 94.726mb +0.00%  | 442.100ms +0.88% | ±1.04% +55.52%  |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.710mb +0.00%  | 70.645ms +0.92%  | ±0.42% -44.55%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 105.308mb +0.00% | 54.677ms +2.40%  | ±0.36% -62.14%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 320.781mb +0.00% | 1.482s +5.67%    | ±0.59% +230.76% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.588mb +0.01%  | 41.400ms +0.21%  | ±0.42% -39.74%  |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.292mb +0.00%  | 2.148ms -1.65%   | ±2.14% +39.13%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.082mb +0.00%  | 182.383ms +1.88% | ±0.47% +23.06%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.608mb +0.00%  | 18.224ms +2.92%  | ±0.39% -30.85%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 75.430mb +0.00%  | 1.822ms +13.74%  | ±2.78% +100.90% |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 75.430mb +0.00%  | 1.674ms +2.92%   | ±1.70% -25.68%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 74.644mb +0.00%  | 2.489ms +0.80%   | ±0.70% -58.81%  |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.173mb +0.00%  | 14.294ms +3.23%  | ±1.55% +143.63% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.173mb +0.00%  | 14.219ms +3.70%  | ±0.67% -35.23%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.075mb +0.00%  | 1.606μs 0.00%    | ±2.89% 0.00%    |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.075mb +0.00%  | 0.400μs +33.33%  | ±0.00% +0.00%   |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 86.632mb +0.00%  | 12.897ms +1.96%  | ±1.81% +354.26% |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 115.992mb +0.00% | 64.272ms -0.77%  | ±2.46% +38.87%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 75.693mb +0.00%  | 1.788ms +3.89%   | ±2.67% +12.34%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 77.961mb +0.00%  | 32.895ms +2.28%  | ±0.59% -56.93%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 77.794mb +0.00%  | 4.466ms -1.99%   | ±1.45% -51.86%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.218mb +0.00%  | 38.341ms -0.33%  | ±0.65% +51.90%  |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.219mb +0.00%  | 38.693ms +0.45%  | ±0.13% -89.10%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.218mb +0.00%  | 39.370ms +3.16%  | ±0.64% -39.74%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.518mb +0.00%  | 7.435ms +2.24%   | ±0.78% -20.22%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.076mb +0.00%  | 29.032ms +2.62%  | ±0.26% -50.39%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.075mb +0.00%  | 12.883μs -0.18%  | ±1.66% +355.91% |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.075mb +0.00%  | 16.679μs +5.45%  | ±2.76% +100.24% |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.083mb +0.00%  | 184.627ms +2.19% | ±2.21% +87.32%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 98.212mb +0.00%  | 947.471ms -0.18% | ±1.03% -59.84%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 21.904mb +0.01%  | 188.805ms -2.75% | ±0.30% +21.50%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 115.836mb +0.00% | 771.782ms -0.68% | ±0.84% -68.17%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.554mb +0.00%  | 381.244ms +0.48% | ±0.81% +98.80%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.676mb +0.01%  | 77.913ms +0.64%  | ±1.14% -25.30%  |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@norberttech norberttech merged commit 2e20054 into flow-php:1.x Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet Writer - Schema Validation

1 participant