-
-
Notifications
You must be signed in to change notification settings - Fork 48
Closed
Description
This is a use case from a question we got on a discord server:
Input:
<rss xmlns:g="http://base.google.com/ns/1.0">
<channel>
<item>
<g:price>123 EUR</g:price>
</item>
<item>
<g:price>154.12 USD</g:price>
</item>
</channel>
</rss>expected output:
(price and currency in separate columns)
+--------------+--------+----------+
| price_string | price | currency |
+--------------+--------+----------+
| 123 EUR | 123.00 | EUR |
+--------------+--------+----------+
1 rows
+--------------+--------+----------+
| price_string | price | currency |
+--------------+--------+----------+
| 154.12 USD | 154.12 | USD |
+--------------+--------+----------+
Currently, it can be achieved by replacing currency/price with empty strings:
df()
->read(from_xml(__DIR__ . '/file.xml', 'rss/channel/item'))
->withEntry('price_string', ref('node')->xpath('g:price')->domElementValue())
->withEntry('price', ref('price_string')->regexReplace(lit('/(A-Z{3})/'), lit(''))->cast('float')->round(lit(2)))
->withEntry('currency', ref('price_string')->regexReplace(lit('/\d+(?:\.\d+)?/'), lit('')))
->drop('node')
->write(to_output(false))
->run();Because we don't have a scalar function that would extract a match from regexp.
I think we can add something similar to:
ref('name')->regex(lit("pattern"), match: lit(0))Regex - ScalarFunction similar to PregMatch function but instead of confirming that the string matches a pattern, it would let us return given match (or multiple matches maybe)
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done