Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • XMLEntry
  • XMLNodeEntry
  • ref('...')->xpath('...') - for extracting specific nodes from XMLEntry
  • ref('...')->domNodeAttribute('...') - for extracting value of attribute
  • ref('...')->domNodeValue('...') - for extracting value of node

Fixed

Changed

  • XMLReaderExtractor is now returning XMLEntry type instead of casting XML's to array

Removed

Deprecated

Security


Description

The goal of this PR is to make XMLReaderExtractor consistent with Extractors architecture.
Originally it was reading XML and transforming NodeElements into arrays, this was heavily inspired by Spark however it was adding transformation responsibility to the Extractor.

This PR changes that (which is a minor BC break) and from now XMLReaderExtractor is throwing XMLEntry which can be later transformer using dedicated XML transformers.

Following code explains how this can be done:

XML
<Salaries>
    <Month name="January">
        <Department name="HR">
            <TotalSalary>71883</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>192644</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>174187</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>179932</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>52056</TotalSalary>
        </Department>
    </Month>
    <Month name="February">
        <Department name="HR">
            <TotalSalary>102342</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>111102</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>81938</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>132202</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>173225</TotalSalary>
        </Department>
    </Month>
    <Month name="March">
        <Department name="HR">
            <TotalSalary>79619</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>99387</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>198847</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>50550</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>98212</TotalSalary>
        </Department>
    </Month>
    <Month name="April">
        <Department name="HR">
            <TotalSalary>69721</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>151826</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>158168</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>111872</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>172334</TotalSalary>
        </Department>
    </Month>
    <Month name="May">
        <Department name="HR">
            <TotalSalary>174220</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>164086</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>104257</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>105817</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>145490</TotalSalary>
        </Department>
    </Month>
    <Month name="June">
        <Department name="HR">
            <TotalSalary>127383</TotalSalary>
        </Department>
        <Department name="Engineering">
            <TotalSalary>52592</TotalSalary>
        </Department>
        <Department name="Finance">
            <TotalSalary>71732</TotalSalary>
        </Department>
        <Department name="Marketing">
            <TotalSalary>165083</TotalSalary>
        </Department>
        <Department name="Sales">
            <TotalSalary>85138</TotalSalary>
        </Department>
    </Month>
</Salaries>

Code:

<?php

(new Flow())
    ->read(XML::from(__FLOW_DATA__ . '/salaries.xml'))
    ->withEntry('months', ref('row')->xpath('/Salaries/Month'))
    ->withEntry('month', ref('months')->expand())
    ->withEntry('month_name', ref('month')->domNodeAttribute('name'))
    ->withEntry('departments', ref('month')->xpath('/Month/Department'))
    ->withEntry('department', ref('departments')->expand())
    ->withEntry('department_name', ref('department')->domNodeAttribute('name'))
    ->withEntry('department_salary', ref('department')->xpath('/Department/TotalSalary')->domNodeValue())
    ->drop('row', 'months', 'month', 'departments', 'department')
    ->groupBy(ref('month_name'))
    ->aggregate(Aggregation::sum(ref('department_salary')))
    ->rename('department_salary_sum', 'total_monthly_salaries')
    ->write(To::output(false))
    ->run();

Output:

Reading XML dataset...
+------------+------------------------+
| month_name | total_monthly_salaries |
+------------+------------------------+
|    January |                 670702 |
|   February |                 600809 |
|      March |                 526615 |
|      April |                 663921 |
|        May |                 693870 |
|       June |                 501928 |
+------------+------------------------+
6 rows

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@norberttech
Copy link
Member Author

@stloyd @JacekPolanski thanks for review 🙏

@norberttech norberttech merged commit 57666bf into flow-php:1.x Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants