crypto.sha3: rewrite and optimize kaccak_p_1600_24() engine, update tests#26524
Conversation
|
@blackshirt take a look. Of course I've tested it with your pslhdsa implementation. |
|
@kimshrier - take a look. What you think? |
spytheman
left a comment
There was a problem hiding this comment.
Excellent work.
Thank you @tankf33der 🙇🏻 .
Can you please submit some of them to https://github.com/vlang/slower_tests (it is a separate repo, but it is also tested by the main CI)? |
|
Thanks for improving the performance. I did a very straight forward implementation and did not have time to optimize it. I was more concerned with having it be correct. I have been preoccupied with other, personal, stuff and this will continue to be the case for several more months. I am glad that you took the time to make it better. |
|
Amazing work! |

I finally want to show the patch for accelerating
sha3performance.This is approximately the 4th generation patch from a multi-week development and fun.
It all started with a patch that speeds up by 10%, and ended up with a multi-fold speedup for both
tccandgcc.If you take my standard file for sha3 performance testing, you can see multiple function calls inside the rounds, once I conquered that it was just a matter of technique.
and even if you check whether the compiler inlined them, it still turns out to be costly.
Besides, the official site suggests merging several functions into one and then they are not needed at all.
The latest generation of the patch consists of simply unrolling the loops and making them less costly.
Had to tinker with it.
I have my own tests with full coverage for files with test vectors and openssl calls so I'm not worried.
Now the profiler shows normal metrics:
Had to sacrifice some tests because they became impossible, there's simply no code that they rely on.
Speed up: tcc ~4.5+ times, gcc ~3+ times