Skip to content

Conversation

@ndossche
Copy link
Member

@ndossche ndossche commented Sep 22, 2023

https://bugs.php.net/bug.php?id=53655

The XPath query is in accordance to spec [1]. However, we can do it in a
simpler way. We can use a custom callback function instead of a linear
search in XPath to check if a node is visible. Note that comment nodes
are handled internally by libxml2 already, so we do not need to
differentiate between node types. The callback will do an upwards
traversal of the tree until the root of the canonicalization is reached.
In practice this will speed up the application a lot.

[1] https://www.w3.org/TR/2001/REC-xml-c14n-20010315 section 2.1

This can make processing easily 100 times faster for a large document. I generated some random XML documents with https://codebeautify.org/generate-random-xml: https://gist.github.com/nielsdos/369813d1b1c5c146a6fd7992b8ddbc28

file.xml: before -> after:
random.xml: 0.159s -> 0.004s
large.xml: 1.256s -> 0.008s

There's another speed-up I could do by replacing the linear search with a search in a HashTable, that's orthogonal to this but also a smaller time save. That's important for the cases that do use a nodeset. something to do as a follow-up probably.

…ents

The XPath query is in accordance to spec [1]. However, we can do it in a
simpler way. We can use a custom callback function instead of a linear
search in XPath to check if a node is visible. Note that comment nodes
are handled internally by libxml2 already, so we do not need to
differentiate between node types. The callback will do an upwards
traversal of the tree until the root of the canonicalization is reached.
In practice this will speed up the application a lot.

[1] https://www.w3.org/TR/2001/REC-xml-c14n-20010315 section 2.1
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible

@ndossche ndossche closed this in 5d68d61 Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants