-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and motivation
VPCLMULQDQ is supported by Intel in the Ice Lake and newer architectures, and by AMD in Zen 4. It allows for parallel pclmulqdq in Vector256 and Vector512 and is important for implementing vectorized CRC32 among other things.
API Proposal
namespace System.Runtime.Intrinsics.X86;
public abstract class Pclmulqdq : Sse2
{
public abstract class V256
{
public static new bool IsSupported { get; }
public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
}
public abstract class V512
{
public static new bool IsSupported { get; }
public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control);
public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control);
}
}API Usage
Examples of vectorized CRC32 implementations using the equivalent C intrinsics abound. One such example: https://github.com/corsix/fast-crc32/blob/main/sample_avx512_vpclmulqdq_crc32c_v4s5x3.c
Alternative Designs
The Pclmulqdq256 and Pclmulqdq512 classes could be nested under Pclmulqdq rather than being top-level classes inheriting from it. Since this ISA includes only a single instruction, that may be preferable.
A case could be made for making Avx the base of Pclulqdq256, as VEX encoding is required for vpclmulqdq. Likewise, Pclmulqdq512 could have Avx512F as a base given its requirement of EVEX encoding. However, the relationship will change with AVX10, where EVEX support will not imply 512-bit vector support.
Risks
N/A