Skip to content

Conversation

@flavioheleno
Copy link
Contributor

@flavioheleno flavioheleno commented Jul 3, 2024

Change Log

Added

  • Added support for BROTLI compression algorithm to parquet

Fixed

Changed

Removed

Deprecated

Security


Description

Add support for BROTLI compression.

Closes #784.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.913mb +0.04%   | 505.119ms -1.04% | ±0.37% +3.36%   |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 3.946mb +0.04%   | 1.065s -0.86%    | ±0.26% -64.66%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.382mb +0.00% | 734.997ms -1.12% | ±0.48% -57.48%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.672mb +0.05%   | 34.205ms +1.91%  | ±1.34% +236.46% |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.619mb +0.05%   | 429.989ms -1.13% | ±0.70% -51.56%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 115.965mb +0.00% | 59.003ms -2.15% | ±1.32% +118.81% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev           |
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.068mb +0.00%  | 85.177ms +0.97% | ±0.78% -54.69%   |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.500mb +0.00% | 51.186ms -3.71% | ±0.25% -77.08%   |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 225.839mb +0.00% | 1.390s -0.17%   | ±0.16% -78.98%   |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.862mb +0.01%  | 43.578ms -1.01% | ±1.25% +1124.97% |
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.515mb +0.00% | 493.866ms -1.95% | ±1.97% -30.46%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.993mb +0.00%  | 249.077ms -1.65% | ±0.58% -37.44%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.927mb +0.01%  | 52.951ms +0.32%  | ±0.39% -55.51%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.694mb +0.00%  | 430.109ms -4.46% | ±0.27% -88.73%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.233mb +0.01%  | 84.435ms -1.58%  | ±0.33% -70.19%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.785mb +0.00%  | 3.222ms -6.19%   | ±1.13% -68.83%   |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.383mb +0.00% | 192.301ms +2.13% | ±0.30% -87.29%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.103mb +0.00%  | 18.958ms -0.73%  | ±0.48% +116.29%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.025mb +0.00%  | 1.755ms +2.59%   | ±1.52% -37.05%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.025mb +0.00%  | 1.684ms -10.92%  | ±1.79% +648.50%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.137mb +0.00%  | 2.636ms +3.23%   | ±1.94% +33.68%   |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.666mb +0.00%  | 14.921ms +2.40%  | ±1.37% +1205.70% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.666mb +0.00%  | 14.925ms +1.77%  | ±1.03% +31.07%   |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.570mb +0.00%  | 1.694μs +5.47%   | ±2.83% -2.00%    |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.570mb +0.00%  | 0.300μs +50.00%  | ±0.00% -100.00%  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.920mb +0.00%  | 12.158ms +0.42%  | ±1.02% +656.47%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.291mb +0.00% | 60.177ms -2.29%  | ±0.54% +15.38%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.186mb +0.00%  | 1.268ms +3.07%   | ±3.38% +121.77%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.533mb +0.00%  | 64.409ms +5.37%  | ±1.56% -48.84%   |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.287mb +0.00%  | 3.834ms -0.93%   | ±0.45% +234.01%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.714mb +0.00%  | 39.006ms -0.35%  | ±0.54% -39.22%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.714mb +0.00%  | 39.849ms +1.68%  | ±0.40% -61.07%   |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.714mb +0.00%  | 39.047ms -1.94%  | ±0.37% -79.11%   |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.011mb +0.00%  | 7.383ms +0.48%   | ±3.47% +363.94%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.570mb +0.00%  | 28.559ms +0.64%  | ±0.86% +611.96%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.570mb +0.00%  | 13.300μs -0.22%  | ±1.23% -29.85%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.570mb +0.00%  | 16.234μs +4.65%  | ±1.27% +109.86%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.384mb +0.00% | 193.549ms +0.22% | ±0.27% -18.34%   |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech
Copy link
Member

Before I merge this one (once conflicts are resolved) would you mind to also mention extensions required to use particular compression algorithms in parquet docs? At the bottom of this page

@flavioheleno
Copy link
Contributor Author

for sure!

@github-actions github-actions bot added the docs label Jul 4, 2024
@norberttech norberttech merged commit 9bbe385 into flow-php:1.x Jul 4, 2024
@flavioheleno flavioheleno deleted the feat/brotli branch July 4, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for BROTLI compression

2 participants