Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • DOMElementAttributesCount scalar function
  • DOMElementAttributeValue scalar function
  • dom_element_to_string helper function
  • Support for casting DOMElement into different types in Caster

Fixed

Changed

  • XMLReaderExtractor will now return each row with node{\DOMElement} instead of node{\DOMDocument}
  • DOMElementAttribute scalar function is now deprecated in favor of DOMElementAttributeValue

Removed

Deprecated

Security


Description

The biggest change of this PR is in how XMLReaderExtractor is yielding rows.

Previously each row had a node entry that was an XMLEntry type. This means that value under it was \DOMDocument which is counter-intuitive since when we are reading from XML we are reading elements not documents.

That's why many scalar function couldn't be applied on the node directly (like element value or attribute value).

Additionally Caster now supports DOMElement when casting to scalar values.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+-------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode              | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+-------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.961mb +4.25%  | 510.203ms -4.48%  | ±0.27% -84.00%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.094mb +4.12%  | 1.061s -2.98%     | ±1.33% -40.71%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 28.553mb +0.60% | 435.821ms -5.17%  | ±0.79% -42.78%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.721mb +4.53%  | 34.075ms -2.42%   | ±1.16% -19.25%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.668mb +4.83%  | 316.704ms -29.40% | ±2.84% +532.83% |
+-----------------------+-------------------+------+-----+-----------------+-------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+------------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode             | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+------------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 116.062mb +0.15% | 59.834ms -37.88% | ±1.20% -63.06% |
+-----------------------------+--------------------------+------+-----+------------------+------------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.183mb +0.31%  | 85.566ms -17.82% | ±0.70% -69.03% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 102.505mb +0.16% | 53.416ms -44.94% | ±1.59% -14.18% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 123.833mb +0.14% | 1.249s -6.81%    | ±0.76% +46.78% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.978mb +0.99%  | 44.770ms -10.70% | ±0.41% -83.78% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+-------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode              | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+-------------------+------------------+
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 52.645mb +0.32%  | 413.827ms -2.88%  | ±3.41% +118.64%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 12.911mb +1.29%  | 84.504ms +5.06%   | ±2.77% +42.18%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.819mb +0.19%  | 3.535ms +1.29%    | ±1.71% +148.10%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.420mb +0.17% | 190.927ms -21.91% | ±1.23% -21.10%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.140mb +0.20%  | 19.092ms -2.16%   | ±0.65% +243.55%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.059mb +0.19%  | 1.828ms +3.55%    | ±1.13% -51.62%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.059mb +0.19%  | 1.818ms -5.18%    | ±0.74% -23.39%   |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.171mb +0.20%  | 2.758ms -21.08%   | ±1.26% +1333.15% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.700mb +0.20%  | 17.943ms -12.00%  | ±2.98% +145.87%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.700mb +0.20%  | 18.322ms -11.00%  | ±0.51% -84.23%   |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.603mb +0.20%  | 1.706μs -31.75%   | ±2.72% -16.73%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.603mb +0.20%  | 0.400μs -20.00%   | ±0.00% +0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.954mb +0.18%  | 12.819ms +0.20%   | ±0.49% -67.84%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.325mb +0.14% | 61.099ms -38.45%  | ±0.41% -87.38%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.219mb +0.20%  | 1.534ms -1.35%    | ±2.27% -29.65%   |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.568mb +0.19%  | 60.790ms -22.55%  | ±0.19% -94.40%   |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.321mb +0.19%  | 4.210ms +0.21%    | ±1.59% +103.92%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.750mb +0.20%  | 38.889ms -4.62%   | ±0.93% -54.73%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.751mb +0.20%  | 39.165ms -2.97%   | ±1.61% +13.91%   |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.750mb +0.20%  | 38.667ms -2.20%   | ±1.01% +287.35%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.045mb +0.20%  | 7.481ms -2.94%    | ±0.45% -88.59%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.603mb +0.20%  | 29.730ms +2.85%   | ±1.18% +65.28%   |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.603mb +0.20%  | 13.488μs -23.43%  | ±0.70% -78.66%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.603mb +0.20%  | 16.300μs -25.41%  | ±0.50% -84.18%   |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.421mb +0.17% | 192.335ms -25.71% | ±0.68% -80.71%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.836mb +0.16% | 471.988ms -6.14%  | ±0.73% -47.85%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.194mb +0.31%  | 239.471ms -5.31%  | ±0.88% -17.42%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.032mb +1.20%  | 51.219ms -8.84%   | ±0.57% -82.27%   |
+-------------------------+----------------------------+------+-----+------------------+-------------------+------------------+

@norberttech norberttech merged commit 57d233b into flow-php:1.x Aug 4, 2024
@norberttech norberttech deleted the feature/xml-reading-improvemets branch December 5, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant