Skip to content

Add support for BROTLI compression#1106

Merged
norberttech merged 2 commits intoflow-php:1.xfrom
flavioheleno:feat/brotli
Jul 4, 2024
Merged

Add support for BROTLI compression#1106
norberttech merged 2 commits intoflow-php:1.xfrom
flavioheleno:feat/brotli

Conversation

@flavioheleno
Copy link
Contributor

@flavioheleno flavioheleno commented Jul 3, 2024

Change Log

Added

  • Added support for BROTLI compression algorithm to parquet

Fixed

Changed

Removed

Deprecated

Security


Description

Add support for BROTLI compression.

Closes #784.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.913mb +0.04%   | 505.119ms -1.04% | ±0.37% +3.36%   |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 3.946mb +0.04%   | 1.065s -0.86%    | ±0.26% -64.66%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.382mb +0.00% | 734.997ms -1.12% | ±0.48% -57.48%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.672mb +0.05%   | 34.205ms +1.91%  | ±1.34% +236.46% |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.619mb +0.05%   | 429.989ms -1.13% | ±0.70% -51.56%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 115.965mb +0.00% | 59.003ms -2.15% | ±1.32% +118.81% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev           |
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.068mb +0.00%  | 85.177ms +0.97% | ±0.78% -54.69%   |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.500mb +0.00% | 51.186ms -3.71% | ±0.25% -77.08%   |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 225.839mb +0.00% | 1.390s -0.17%   | ±0.16% -78.98%   |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.862mb +0.01%  | 43.578ms -1.01% | ±1.25% +1124.97% |
+--------------------+----------------+------+-----+------------------+-----------------+------------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.515mb +0.00% | 493.866ms -1.95% | ±1.97% -30.46%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.993mb +0.00%  | 249.077ms -1.65% | ±0.58% -37.44%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.927mb +0.01%  | 52.951ms +0.32%  | ±0.39% -55.51%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.694mb +0.00%  | 430.109ms -4.46% | ±0.27% -88.73%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.233mb +0.01%  | 84.435ms -1.58%  | ±0.33% -70.19%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.785mb +0.00%  | 3.222ms -6.19%   | ±1.13% -68.83%   |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.383mb +0.00% | 192.301ms +2.13% | ±0.30% -87.29%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.103mb +0.00%  | 18.958ms -0.73%  | ±0.48% +116.29%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.025mb +0.00%  | 1.755ms +2.59%   | ±1.52% -37.05%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.025mb +0.00%  | 1.684ms -10.92%  | ±1.79% +648.50%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.137mb +0.00%  | 2.636ms +3.23%   | ±1.94% +33.68%   |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.666mb +0.00%  | 14.921ms +2.40%  | ±1.37% +1205.70% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.666mb +0.00%  | 14.925ms +1.77%  | ±1.03% +31.07%   |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.570mb +0.00%  | 1.694μs +5.47%   | ±2.83% -2.00%    |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.570mb +0.00%  | 0.300μs +50.00%  | ±0.00% -100.00%  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.920mb +0.00%  | 12.158ms +0.42%  | ±1.02% +656.47%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.291mb +0.00% | 60.177ms -2.29%  | ±0.54% +15.38%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.186mb +0.00%  | 1.268ms +3.07%   | ±3.38% +121.77%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.533mb +0.00%  | 64.409ms +5.37%  | ±1.56% -48.84%   |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.287mb +0.00%  | 3.834ms -0.93%   | ±0.45% +234.01%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.714mb +0.00%  | 39.006ms -0.35%  | ±0.54% -39.22%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.714mb +0.00%  | 39.849ms +1.68%  | ±0.40% -61.07%   |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.714mb +0.00%  | 39.047ms -1.94%  | ±0.37% -79.11%   |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.011mb +0.00%  | 7.383ms +0.48%   | ±3.47% +363.94%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.570mb +0.00%  | 28.559ms +0.64%  | ±0.86% +611.96%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.570mb +0.00%  | 13.300μs -0.22%  | ±1.23% -29.85%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.570mb +0.00%  | 16.234μs +4.65%  | ±1.27% +109.86%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.384mb +0.00% | 193.549ms +0.22% | ±0.27% -18.34%   |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech
Copy link
Member

Before I merge this one (once conflicts are resolved) would you mind to also mention extensions required to use particular compression algorithms in parquet docs? At the bottom of this page

@flavioheleno
Copy link
Contributor Author

for sure!

@github-actions github-actions bot added the docs label Jul 4, 2024
@norberttech norberttech merged commit 9bbe385 into flow-php:1.x Jul 4, 2024
@flavioheleno flavioheleno deleted the feat/brotli branch July 4, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for BROTLI compression

2 participants