-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[WIP] Use SequenceEqual with Intrinsics to use both Vector sizes at once #22019
Conversation
|
Not quite sure where I've broken it atm |
77d17ed to
37284d2
Compare
|
Performance numbers to show that this actually makes a difference? |
37284d2 to
ee51327
Compare
Currently regresses 😢 |
|
Improves for large lengths > 512, but regresses for shorter lengths, which isn't what I was going for... |
9f69600 to
39ec7b2
Compare
| Unsafe.ReadUnaligned<Vector<byte>>(ref Unsafe.AddByteOffset(ref second, n)); | ||
| } | ||
| // Use Sse2.IsSupported check to remove this branch at Jit time if the CPU does not support it. | ||
| else if (!Avx2.IsSupported && Sse2.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the redundant !Avx2.IsSupported needed here? I would expect that the if (Avx2.IsSupported) above will be allow JIT to dead-code eliminate the else part.
9e23426 to
07fa382
Compare
|
Tried lots of permutations nothing that's a significant and consistent win; will try another time. Has given me ideas on the other vectorized methods where there should be easier wins. |
The .NET Vector type is always the largest supported size; so if Avx2 is enabled byte sequences have to be 32 or larger to benefit from vectorization.
Using the Intrinsics we can enable it using Sse2 for sizes 16 or larger and then switch to Avx2 for 32 or larger on Avx2 machines.
Also some tweaks for codegen compactness
The intrinsic variant is slightly better than the output the Jit generates as it drops a
vmovapsinstructionG_M17140_IG11: C4A179100409 vmovupd xmm0, xmmword ptr [rcx+r9] C4A179100C0A vmovupd xmm1, xmmword ptr [rdx+r9] - C5FC28D0 vmovaps ymm2, ymm0 - C5ED76D1 vpcmpeqd ymm2, ymm1 + C5F974C1 vpcmpeqb xmm0, xmm0, xmm1 - C5FDD7C2 vpmovmskb eax, ymm2 + C579D7D0 vpmovmskb r10d, xmm0 - 83F8FF cmp eax, -1 + 4183FAFF cmp r10d, -1 - 0F858E000000 jne G_M17103_IG14 + 752B jne SHORT G_M17140_IG14/cc @CarolEidt @fiigii @tannergooding