Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

Fixed

Changed

Removed

  • Removed null type

Deprecated

Security


Description

Null as a EntryType was always problematic, mostly due to simple fact that in most data sources/sinks it simply does not exists as a standalone type. For example in databases, we can have nullable columns but not null fields.
Same with Parquet and Avro, we can have nullable strings/integeres/maps/etc but we don't have null column type.

That was creating a very inconsistent behavior when reading from those strongly typed data srouces.
Reading a nullable boolean column from parquet where all values are null we would get NullEntry in Flow Schema, instead of getting BoolEntry that is nullable. That was similarly problematic when we wanted to write into parquet a nullable column with only null values in a given row group.

Another indication that this change was necessary is how well it worked with Entry Type while creating entries.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 31, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.282mb +0.01%  | 838.205ms +0.96% | ±1.63% -39.84%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 5.005mb +0.04%   | 341.535ms -0.23% | ±0.52% +688.28% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.156mb +0.11%   | 1.066s +2.13%    | ±0.63% +140.39% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.831mb -0.00% | 905.395ms +1.11% | ±0.91% -8.00%   |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.913mb +0.09%   | 35.967ms +1.51%  | ±1.12% +96.31%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.919mb +0.09%   | 433.765ms -1.68% | ±0.40% -61.67%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev        |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.619mb +5.07% | 60.784ms -7.74% | ±0.74% -5.95% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.663mb +1.06%  | 457.220ms -3.61% | ±0.54% -45.75%  |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.144mb +1.85%  | 68.352ms -6.42%  | ±0.41% -45.81%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.570mb +0.95% | 50.106ms -6.15%  | ±1.21% +55.96%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 224.394mb +1.16% | 1.397s -3.24%    | ±0.42% -23.12%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.960mb +0.03%  | 39.183ms -5.39%  | ±1.10% +825.95% |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.686mb +13.52% | 3.219ms -5.30%   | ±1.43% +445.59% |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.413mb +6.47%  | 187.699ms +0.22% | ±1.15% +217.23% |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.938mb +13.92% | 18.926ms -1.80%  | ±0.69% -24.00%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.926mb +13.30% | 1.654ms -10.28%  | ±0.87% -76.41%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.926mb +13.30% | 1.639ms -17.40%  | ±0.71% -69.47%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 75.038mb +13.81% | 2.464ms -10.41%  | ±2.81% +383.25% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.567mb +13.72% | 16.100ms +7.24%  | ±0.87% +206.49% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.567mb +13.72% | 16.326ms +6.85%  | ±0.71% -48.81%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.471mb +14.11% | 1.706μs -5.22%   | ±2.72% +0.00%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.471mb +14.11% | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 87.025mb +7.08%  | 12.717ms -2.75%  | ±0.80% +340.88% |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.386mb +5.30% | 60.666ms -10.05% | ±0.66% -24.34%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 76.086mb +13.62% | 1.193ms -19.06%  | ±1.03% -62.47%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 79.433mb +13.05% | 62.706ms +4.58%  | ±2.21% +925.90% |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 78.188mb +13.26% | 3.792ms -7.16%   | ±0.48% -42.91%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.549mb +14.09% | 40.137ms -0.46%  | ±0.98% +221.08% |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.550mb +14.09% | 40.048ms -1.66%  | ±0.65% +43.45%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.549mb +14.09% | 40.227ms -1.84%  | ±0.48% -23.46%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.912mb +13.65% | 7.372ms -3.54%   | ±0.88% +164.01% |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.471mb +14.11% | 28.914ms -1.31%  | ±1.46% -1.05%   |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.471mb +14.11% | 13.482μs -1.47%  | ±1.06% -32.58%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.471mb +14.11% | 16.212μs +0.82%  | ±1.63% +55.38%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.479mb +6.40%  | 190.532ms -0.37% | ±0.90% +22.64%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.718mb +0.01% | 507.385ms +4.74% | ±0.22% -69.62%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 60.196mb +0.02%  | 255.960ms +3.95% | ±0.29% -43.63%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 15.130mb +0.06%  | 54.362ms +3.22%  | ±0.94% -24.98%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.961mb -0.00%  | 428.140ms -1.43% | ±0.34% +46.03%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.500mb -0.00%  | 85.088ms -2.54%  | ±0.56% +563.11% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@github-actions github-actions bot added the docs label Mar 31, 2024
@norberttech norberttech merged commit ef10a6d into flow-php:1.x Apr 2, 2024
@norberttech norberttech deleted the feature/remove-null-type branch May 9, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant