Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • Added Schema::match() with strict/evolving matchers

Fixed

Changed

Removed

Deprecated

Security


Description

This is an introduction to a potential schema evolution feature. Schema evolution is a process where by writing more data to dataset we are making sure that reading from it is still safe to the clients.
The base assumptions are following:

  • we can safely add new entries
  • we can safely make non nullable entries, nullable
  • we can safely keep schema the same

however if we:

  • remove a field
  • make a nullable field non nullable
  • change field type

we can impact our dataset clients.

Using Schema::matches($schema, schema_evolving_matcher()): bool should prevent us from making an BC breaks in our datasets.

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.280mb +0.01%  | 825.295ms +0.66% | ±1.79% +322.27% |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 5.003mb +0.05%   | 340.413ms -0.46% | ±0.36% -79.27%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.153mb +0.05%   | 1.054s +0.89%    | ±0.87% -6.79%   |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.828mb +0.00% | 897.988ms +0.04% | ±0.90% -13.10%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.910mb +0.05%   | 35.667ms +1.63%  | ±0.58% -31.80%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.916mb +0.05%   | 431.594ms -0.43% | ±0.13% -93.35%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.617mb +0.00% | 64.789ms +2.81% | ±1.83% +22.72% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.660mb +0.00%  | 466.758ms +0.66% | ±0.62% -4.44%   |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.142mb +0.00%  | 71.865ms +0.78%  | ±0.69% -42.49%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.568mb +0.00% | 53.708ms +1.81%  | ±1.18% +74.60%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 224.390mb +0.00% | 1.427s +0.75%    | ±0.33% -28.27%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.957mb +0.01%  | 40.234ms +2.49%  | ±2.26% +211.58% |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.683mb +0.00%  | 3.578ms +8.46%   | ±2.44% +60.25%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.410mb +0.00%  | 182.663ms -0.11% | ±0.25% -62.09%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.936mb +0.00%  | 18.484ms -0.60%  | ±0.54% -64.54%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.923mb +0.00%  | 1.838ms +18.24%  | ±1.14% -48.74%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.923mb +0.00%  | 1.876ms +21.24%  | ±3.49% +42.47%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 75.035mb +0.00%  | 3.087ms +23.74%  | ±3.05% +114.62% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.564mb +0.00%  | 15.475ms +3.30%  | ±1.01% -67.66%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.564mb +0.00%  | 15.109ms +1.55%  | ±1.23% +422.99% |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.468mb +0.00%  | 1.994μs +16.87%  | ±2.40% -11.86%  |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.468mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 87.023mb +0.00%  | 13.015ms +1.64%  | ±0.86% +58.29%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.384mb +0.00% | 66.999ms +3.87%  | ±3.23% +131.11% |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 76.084mb +0.00%  | 1.328ms +17.58%  | ±3.27% +331.16% |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 79.431mb +0.00%  | 58.061ms +1.01%  | ±1.36% +73.90%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 78.185mb +0.00%  | 3.917ms +3.19%   | ±3.36% +165.13% |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.546mb +0.00%  | 40.598ms -0.56%  | ±1.84% -6.93%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.547mb +0.00%  | 39.941ms -0.73%  | ±0.19% -42.42%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.546mb +0.00%  | 40.295ms -0.76%  | ±0.82% -33.39%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.910mb +0.00%  | 7.365ms +0.54%   | ±0.64% -55.59%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.468mb +0.00%  | 29.045ms +0.13%  | ±1.75% +61.61%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.468mb +0.00%  | 13.300μs +0.14%  | ±0.61% -42.70%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.468mb +0.00%  | 16.060μs +1.64%  | ±2.09% +305.22% |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.477mb +0.00%  | 184.570ms -0.98% | ±0.53% -37.55%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.715mb +0.00% | 483.970ms -1.30% | ±1.15% +158.10% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 60.193mb +0.00%  | 249.509ms -0.73% | ±0.76% -23.09%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 15.128mb +0.02%  | 53.014ms +3.79%  | ±1.35% +114.19% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.959mb +0.00%  | 435.690ms +2.15% | ±0.41% +98.14%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.498mb +0.02%  | 86.116ms +1.31%  | ±1.16% +983.98% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@norberttech norberttech merged commit 8d16fcd into flow-php:1.x Mar 28, 2024
@norberttech norberttech deleted the feature/schema-matching branch May 9, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant