Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@benaadams
Copy link
Member

The .NET Vector type is always the largest supported size; so if Avx2 is enabled byte sequences have to be 32 or larger to benefit from vectorization.

Using the Intrinsics we can enable it using Sse2 for sizes 16 or larger and then switch to Avx2 for 32 or larger on Avx2 machines.

Also some tweaks for codegen compactness

The intrinsic variant is slightly better than the output the Jit generates as it drops a vmovaps instruction

G_M17140_IG11:
        C4A179100409         vmovupd  xmm0, xmmword ptr [rcx+r9]
        C4A179100C0A         vmovupd  xmm1, xmmword ptr [rdx+r9]
-       C5FC28D0             vmovaps  ymm2, ymm0
-       C5ED76D1             vpcmpeqd ymm2, ymm1
+       C5F974C1             vpcmpeqb xmm0, xmm0, xmm1
-       C5FDD7C2             vpmovmskb eax, ymm2
+       C579D7D0             vpmovmskb r10d, xmm0
-       83F8FF               cmp      eax, -1
+       4183FAFF             cmp      r10d, -1
-       0F858E000000         jne      G_M17103_IG14
+       752B                 jne      SHORT G_M17140_IG14

/cc @CarolEidt @fiigii @tannergooding

@benaadams
Copy link
Member Author

Not quite sure where I've broken it atm

@benaadams benaadams force-pushed the mixed-intriniscs branch 2 times, most recently from 77d17ed to 37284d2 Compare January 17, 2019 06:04
@jkotas
Copy link
Member

jkotas commented Jan 17, 2019

Performance numbers to show that this actually makes a difference?

@benaadams benaadams changed the title Use SequenceEqual with Intrinsics to use both Vector sizes at once [WIP] Use SequenceEqual with Intrinsics to use both Vector sizes at once Jan 17, 2019
@benaadams
Copy link
Member Author

Performance numbers to show that this actually makes a difference?

Currently regresses 😢

@benaadams
Copy link
Member Author

Improves for large lengths > 512, but regresses for shorter lengths, which isn't what I was going for...

Unsafe.ReadUnaligned<Vector<byte>>(ref Unsafe.AddByteOffset(ref second, n));
}
// Use Sse2.IsSupported check to remove this branch at Jit time if the CPU does not support it.
else if (!Avx2.IsSupported && Sse2.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the redundant !Avx2.IsSupported needed here? I would expect that the if (Avx2.IsSupported) above will be allow JIT to dead-code eliminate the else part.

@benaadams benaadams closed this Jan 21, 2019
@benaadams
Copy link
Member Author

Tried lots of permutations nothing that's a significant and consistent win; will try another time.

Has given me ideas on the other vectorized methods where there should be easier wins.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants