Skip to content

crypto.sha3: rewrite and optimize kaccak_p_1600_24() engine, update tests#26524

Merged
spytheman merged 1 commit into
vlang:masterfrom
tankf33der:sha3
Feb 6, 2026
Merged

crypto.sha3: rewrite and optimize kaccak_p_1600_24() engine, update tests#26524
spytheman merged 1 commit into
vlang:masterfrom
tankf33der:sha3

Conversation

@tankf33der

Copy link
Copy Markdown
Contributor

I finally want to show the patch for accelerating sha3 performance.
This is approximately the 4th generation patch from a multi-week development and fun.
It all started with a patch that speeds up by 10%, and ended up with a multi-fold speedup for both tcc and gcc.

If you take my standard file for sha3 performance testing, you can see multiple function calls inside the rounds, once I conquered that it was just a matter of technique.

import crypto.sha3
import time

fn main() {
	a := []u8{len: 10_000_000}
	t1 := time.now()
	_ := sha3.sum512(a)
	println(time.since(t1))
}
        138889         93.624ms         46.706ms            674ns crypto__sha3__State_xor_bytes 
       1250001         46.917ms         46.917ms             38ns encoding__binary__little_endian_u64_at 
       3333336         83.607ms         83.607ms             25ns crypto__sha3__State_iota 
        138889       8219.910ms        101.634ms          59183ns crypto__sha3__State_kaccak_p_1600_24 
       3333336        522.927ms        522.927ms            157ns crypto__sha3__State_pi 
       3333336       8118.276ms        556.868ms           2435ns crypto__sha3__State_rnd 
       3333336        684.678ms        684.678ms            205ns crypto__sha3__State_chi 
       3333336       1454.097ms       1026.246ms            436ns crypto__sha3__State_theta 
     100000080       2475.980ms       2475.980ms             25ns math__bits__rotate_left_64 
       3333336       4816.100ms       2767.971ms           1445ns crypto__sha3__State_rho 

and even if you check whether the compiler inlined them, it still turns out to be costly.
Besides, the official site suggests merging several functions into one and then they are not needed at all.
The latest generation of the patch consists of simply unrolling the loops and making them less costly.
Had to tinker with it.
I have my own tests with full coverage for files with test vectors and openssl calls so I'm not worried.

Now the profiler shows normal metrics:

             2          0.010ms          0.010ms           5018ns builtin___write_buf_to_fd 
             2          0.010ms          0.010ms           5174ns builtin___v_malloc 
             2          0.019ms          0.017ms           9376ns time__linux_now 
             6          1.239ms          1.239ms         206538ns builtin__vcalloc_noscan 
        277779         10.739ms         10.739ms             39ns builtin__array_slice 
             1       5363.798ms         18.982ms     5363798118ns crypto__sha3__Digest_write 
        138889         91.799ms         45.508ms            661ns crypto__sha3__State_xor_bytes 
       1250001         46.292ms         46.292ms             37ns encoding__binary__little_endian_u64_at 
      96666744       2336.159ms       2336.159ms             24ns math__bits__rotate_left_64 
        138889       5242.316ms       2906.158ms          37745ns crypto__sha3__State_kaccak_p_1600_24 

Had to sacrifice some tests because they became impossible, there's simply no code that they rely on.

Speed up: tcc ~4.5+ times, gcc ~3+ times

@tankf33der

Copy link
Copy Markdown
Contributor Author

@blackshirt take a look. Of course I've tested it with your pslhdsa implementation.

@tankf33der

Copy link
Copy Markdown
Contributor Author

@kimshrier - take a look. What you think?

@spytheman

Copy link
Copy Markdown
Contributor

On my m1:
image

using a variation of this (if someone needs to re-check on another machine):

branch=$(git rev-parse --abbrev-ref HEAD); for compiler in tcc clang gcc-15; do bname=sha_${compiler}_${branch}; v -cc $compiler -o $bname sha.v && ll $bname && xtime ./$bname; done

@spytheman spytheman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.
Thank you @tankf33der 🙇🏻 .

@spytheman

Copy link
Copy Markdown
Contributor

I have my own tests with full coverage for files with test vectors and openssl calls so I'm not worried.

Can you please submit some of them to https://github.com/vlang/slower_tests (it is a separate repo, but it is also tested by the main CI)?

@kimshrier

Copy link
Copy Markdown
Contributor

Thanks for improving the performance. I did a very straight forward implementation and did not have time to optimize it. I was more concerned with having it be correct.

I have been preoccupied with other, personal, stuff and this will continue to be the case for several more months. I am glad that you took the time to make it better.

@medvednikov

Copy link
Copy Markdown
Member

Amazing work!

@spytheman spytheman merged commit 65cf633 into vlang:master Feb 6, 2026
77 of 80 checks passed
cestef pushed a commit to cestef/v that referenced this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants