Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • Allow to use json pointer as an entry name for extracted records
  • More tests to array_to_row and array_to_rows methods

Fixed

  • Fixed Json extractor when pointer is used to extract flat arrays

Changed

  • When JsonExtractor is using a pointer, pointer becomes a key

Removed

Deprecated

Security


Description

Resolves: #1210

Regression was introduced here when I wasn't sure how to handle this one edge case.

I also added additional argument to JsonExtgracto::withPointer(string $pointer, bool $pointerToEntryName) : self
What's the difference?

<?php 

df()
    ->read(from_json(__DIR__ . '/file.json')->withPointer('/data'))
    ->write(to_output(false))
    ->run();

this will return:

+-----+-------------+-------------------------------------+--------------+--------------+--------------+----------------+--------+-----+----------------------+------+------+------+----------+--------------+-----+
| e00 |         e01 |                                 e02 |          e03 |          e04 |          e05 |            e06 |    e07 | e08 |                  e09 |  e10 |  e11 |  e12 |      e13 |          e14 | e15 |
+-----+-------------+-------------------------------------+--------------+--------------+--------------+----------------+--------+-----+----------------------+------+------+------+----------+--------------+-----+
| 647 | Margie Rowe | [email protected] | E-Purchasing | July 9, 2024 | June 1, 2024 | Rp. 15,284,386 | 101753 |   0 | Procurement of Goods | true | true | APBD | 52049496 | June 1, 2024 |     |
+-----+-------------+-------------------------------------+--------------+--------------+--------------+----------------+--------+-----+----------------------+------+------+------+----------+--------------+-----+
1 rows
+-----+-------------+-------------------------------------+-----------------+----------------+---------------+----------------+-------+-----+-----+-------+-------+------+----------+------------------+--------+
| e00 |         e01 |                                 e02 |             e03 |            e04 |           e05 |            e06 |   e07 | e08 | e09 |   e10 |   e11 |  e12 |      e13 |              e14 |    e15 |
+-----+-------------+-------------------------------------+-----------------+----------------+---------------+----------------+-------+-----+-----+-------+-------+------+----------+------------------+--------+
| 165 | Margie Rowe | [email protected] | Self-management | March 25, 2024 | March 1, 2024 | Rp. 81,778,000 | 60276 |   1 |     | false | false | APBD | 36070200 | December 1, 2024 | Type I |
+-----+-------------+-------------------------------------+-----------------+----------------+---------------+----------------+-------+-----+-----+-------+-------+------+----------+------------------+--------+
1 rows

But not setting pointerToEntryName to true:

<?php 

df()
    ->read(from_json(__DIR__ . '/file.json')->withPointer('/data', pointerToEntryName: true))
    ->write(to_output(false))
    ->run();

will now return something like this:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                                           /data |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ["647","Margie Rowe","[email protected]","E-Purchasing","July 9, 2024","June 1, 2024","Rp. 15,284,386","101753","0","Procurement of Goods","true","true","APBD","52049496","June 1, 2024",""] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 rows
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                                        /data |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ["165","Margie Rowe","[email protected]","Self-management","March 25, 2024","March 1, 2024","Rp. 81,778,000","60276","1","","false","false","APBD","36070200","December 1, 2024","Type I"] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 rows

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev         |
+-----------------------+-------------------+------+-----+-----------------+------------------+----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.537mb -0.08%  | 505.653ms -0.96% | ±0.34% +35.38% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.601mb +1.00%  | 1.072s +1.82%    | ±0.66% +84.03% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 29.108mb -0.01% | 430.762ms -1.43% | ±1.16% +28.95% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.293mb -0.08%  | 34.191ms +1.25%  | ±0.87% +19.08% |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.284mb -0.08%  | 780.426ms +1.32% | ±2.39% -1.13%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 116.570mb -0.00% | 59.658ms -1.77% | ±0.98% +188.16% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.735mb -0.01%  | 139.709ms -3.21% | ±0.73% -54.41%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 90.343mb -0.00%  | 119.857ms +2.83% | ±1.43% +244.93% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 124.462mb -0.00% | 1.237s -0.22%    | ±0.40% -73.19%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.484mb -0.02%  | 44.272ms +0.39%  | ±0.78% +9.32%   |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 107.412mb -0.00% | 467.526ms +0.59% | ±0.62% -48.33%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.771mb -0.01%  | 238.724ms -0.91% | ±1.72% -50.09%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.609mb -0.02%  | 52.041ms +0.90%  | ±0.58% +17.76%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 87.326mb -0.00%  | 3.429ms -6.70%   | ±1.51% +19.38%   |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.930mb -0.00% | 189.497ms +1.56% | ±0.26% -64.49%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.650mb -0.00%  | 18.751ms +1.04%  | ±0.70% +1370.24% |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.566mb -0.00%  | 1.710ms -2.64%   | ±1.19% +72.09%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.566mb -0.00%  | 1.771ms -6.20%   | ±1.21% -40.92%   |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.678mb -0.00%  | 2.768ms -9.24%   | ±0.82% -65.13%   |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 86.207mb -0.00%  | 16.002ms -3.79%  | ±1.15% +104.20%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 86.207mb -0.00%  | 16.127ms -4.13%  | ±0.47% -5.26%    |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 84.111mb -0.00%  | 1.694μs 0.00%    | ±2.83% 0.00%     |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 84.111mb -0.00%  | 0.300μs -25.00%  | ±0.00% -100.00%  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 93.461mb -0.00%  | 12.702ms +0.17%  | ±0.77% -57.45%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.832mb -0.00% | 60.086ms -1.99%  | ±0.44% -35.53%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.727mb -0.00%  | 1.457ms -4.95%   | ±1.30% +24.51%   |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 90.083mb -0.00%  | 58.285ms -8.67%  | ±0.85% -11.99%   |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.828mb -0.00%  | 4.194ms -3.53%   | ±2.74% +9.89%    |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 84.260mb -0.00%  | 38.505ms +0.47%  | ±0.76% -24.94%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 84.261mb -0.00%  | 38.930ms +0.24%  | ±0.43% -69.93%   |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 84.260mb -0.00%  | 38.536ms -0.39%  | ±1.62% +133.75%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.552mb -0.00%  | 7.310ms -2.80%   | ±0.39% -77.02%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 84.111mb -0.00%  | 28.607ms +0.84%  | ±1.62% +33.15%   |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 84.111mb -0.00%  | 13.318μs -2.31%  | ±1.06% -30.32%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 84.111mb -0.00%  | 16.300μs -0.22%  | ±2.50% +46.11%   |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.931mb -0.00% | 190.926ms +0.68% | ±0.50% -43.47%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 53.216mb -0.01%  | 419.390ms +7.21% | ±0.80% -75.60%   |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 13.482mb -0.03%  | 79.145ms -5.26%  | ±0.42% -87.85%   |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech norberttech merged commit 4eaee66 into flow-php:1.x Sep 6, 2024
@norberttech norberttech deleted the bug/gh-1210 branch December 5, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Processing JSON array trigger Passed array keys must be a string. Maybe consider using "array_to_rows()" function? error

1 participant