Skip to content

Conversation

@ndossche
Copy link
Member

@ndossche ndossche commented Feb 4, 2023

I saw a report here: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14381#note_276471 that metaphone under Alpine was way slower than under Debian. Making less calls into libc should improve performance on both Alpine and other distros.

All inputs for ENCODE() are already uppercase, so there's no need to
spend time uppercasing them again.
If it's a zero-terminator check, or an isalpha() check there's no need
to convert it to uppercase first.
@ndossche ndossche changed the title Metaphone Metaphone cleanup Feb 4, 2023
@devnexen
Copy link
Member

devnexen commented Feb 4, 2023

Looks fine but did you measure any improvement ?

@ndossche
Copy link
Member Author

ndossche commented Feb 4, 2023

I did measure an improvement over a simple test case (executing the metaphone test in php 10K times) and the amount of instructions executed dropped from 851,037,969 to 699,677,969, so a 17.7% perf increase.
The profile still clearly shows toupper to be executed many many times redundantly, so I think I can improve this further. I'll look into improving this more later today :)

We don't have to re-read letters, and re-uppercase them if we already
did it once. By caching these results we gain performance.
Furthermore, we can avoid fetching and uppercasing in some conditions by
first checking what we already had: e.g. if a condition depends on both
Prev_Letter and After_Next_Letter, but we already have Prev_Letter
cached we can place that first to avoid a fetch+toupper of the
"after next letter".
@ndossche
Copy link
Member Author

ndossche commented Feb 4, 2023

I pushed an additional performance improvement.
Final performance numbers: started at 851,037,969 instructions and got it down to 557,080,277, which is a 34.54% improvement in performance over master.
This is without sacrificing too much readability of the code imo.

Main ideas were to cache the read and uppercased result and in some cases reorder the condition in an if check to avoid fetching and uppercasing if the condition couldn't be fulfilled anyway.

@ndossche ndossche changed the title Metaphone cleanup Metaphone performance improvement Feb 4, 2023
@devnexen devnexen requested a review from Girgias February 4, 2023 15:34
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It may also make sense to implement Double Metaphone (released in 2000) or an old version of Metaphone 3 which exists under a BSD licence (https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/clustering/binning/Metaphone3.java) to improve the accuracy of this function.

@Girgias Girgias merged commit c9cbe52 into php:master Feb 5, 2023
@andypost
Copy link
Contributor

andypost commented Feb 5, 2023

Will it be backported to 8.2?

@Girgias
Copy link
Member

Girgias commented Feb 5, 2023

Will it be backported to 8.2?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants