Conversation
merge from base
merge base
…ps512, castps256_ps512
…pd512, castpd256_pd512
…astsi512_ps, castsi512_pd
|
r? @Amanieu (rust_highfive has picked a reviewer for you, use r? to override) |
|
For boradcast_f32x4,i32x4,f64x4,i64x4 |
|
Add |
crates/core_arch/src/x86/avx512f.rs
Outdated
| #[target_feature(enable = "avx512f")] | ||
| #[cfg_attr(test, assert_instr(vbroadcas))] //should be vpbroadcastq | ||
| pub unsafe fn _mm512_broadcastq_epi64(a: __m128i) -> __m512i { | ||
| simd_shuffle8(a, a, [1, 1, 1, 1, 1, 1, 1, 1]) |
There was a problem hiding this comment.
Shouldn't this be [0, 0, 0, 0, 0, 0, 0, 0]? The documentation says the low element is broadcast.
There was a problem hiding this comment.
Shouldn't this be
[0, 0, 0, 0, 0, 0, 0, 0]? The documentation says the low element is broadcast.
Yes. You are correct. Thanks.
crates/core_arch/src/x86/avx512f.rs
Outdated
| #[target_feature(enable = "avx512f")] | ||
| #[cfg_attr(test, assert_instr(vperm))] //should be vpunpckhqdq | ||
| pub unsafe fn _mm512_unpackhi_epi64(a: __m512i, b: __m512i) -> __m512i { | ||
| simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4]) |
There was a problem hiding this comment.
The correct shuffle should be [1, 9, 3, 11, 5, 13, 7, 15].
You can get this by compiling the intrinsic with Clang and extracting the LLVM IR: https://godbolt.org/z/PzoKPz
| b, | ||
| [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 12, 13, 14, 15], | ||
| ), | ||
| _ => simd_shuffle16(a, b, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19]), |
There was a problem hiding this comment.
Could you change this to panic if an invalid imm8 is passed in? Same for all the others.
There was a problem hiding this comment.
Could you change this to panic if an invalid
imm8is passed in? Same for all the others.
As an example _mm512_inserti32x4(a, b, imm8)
In clang, it will check if imm8 is [0..3] in compile time. But in Rust we need to use constify_imm2 to check it in the run time.
But, constify_imm2 uses & 0b11 to make everything pass.
macro_rules! constify_imm2 {
($imm8:expr, $expand:ident) => {
#[allow(overflowing_literals)]
match ($imm8) & 0b11 {
0 => $expand!(0),
1 => $expand!(1),
2 => $expand!(2),
_ => $expand!(3),
}
};
}
So I need to create something like constify_imm2_sae?
match ($imm8)
0=>...
1=>...
...
_=> panic!(xxxxx)?
In x86_64: avx. Clang will check if index is [0..3], so we need to add the invalid check on this?
pub unsafe fn _mm256_insert_epi64(a: __m256i, i: i64, index: i32) -> __m256i {
let a = a.as_i64x4();
match index & 3 {
0 => transmute(simd_insert(a, 0, i)),
1 => transmute(simd_insert(a, 1, i)),
2 => transmute(simd_insert(a, 2, i)),
_ => transmute(simd_insert(a, 3, i)),
}
}
There was a problem hiding this comment.
or we just use "assert!(imm8 >= 0 && imm8 <= 255)"
There was a problem hiding this comment.
Ideally the compiler would reject out-of-range values at compile-time, but rustc doesn't support this yet. So instead we should panic if an out-of-range value is passed. This allows us to reject out-of-range values in the future once rustc has that functionality.
You can either do the same as consitfy_imm2_sae or just assert!(imm8 <= 3).
There was a problem hiding this comment.
Ok. I will do assert!(imm8 <= 3) and update a new version.
crates/core_arch/src/x86/avx512f.rs
Outdated
| #[target_feature(enable = "avx512f")] | ||
| #[cfg_attr(test, assert_instr(vperm))] //should be vunpackhpd | ||
| pub unsafe fn _mm512_unpackhi_pd(a: __m512d, b: __m512d) -> __m512d { | ||
| simd_shuffle8(a, b, [2, 10, 3, 11, 2 + 4, 10 + 4, 3 + 4, 11 + 4]) |
There was a problem hiding this comment.
The correct shuffle here should be [1, 9, 3, 11, 5, 13, 7, 15].
| /// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_mask_broadcastd_epi32&expand=546) | ||
| #[inline] | ||
| #[target_feature(enable = "avx512f")] | ||
| #[cfg_attr(test, assert_instr(vpbroadcast))] //should be vpbroadcastd |
There was a problem hiding this comment.
This and the one below generate the correct vpbroadcastd instruction. The comment is wrong.
crates/core_arch/src/x86/avx512f.rs
Outdated
| /// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=512_broadcast_i32x4&expand=510) | ||
| #[inline] | ||
| #[target_feature(enable = "avx512f")] | ||
| //#[cfg_attr(test, assert_instr(vbroadcast))] should be vbroadcasti32x4 |
There was a problem hiding this comment.
Add a comment explaining that LLVM doesn't generate a vbroadcast instruction for this intrinsic. Please don't leave commented-out code lying around.
There was a problem hiding this comment.
Add a comment explaining that LLVM doesn't generate a
vbroadcastinstruction for this intrinsic. Please don't leave commented-out code lying around.
Yes, For boradcast_f32x4,i32x4,f64x4,i64x4
The instruction generated from Linux and msvc are totally different. (shuffle vs broadcast).
So I remove the test and add a comment?
There was a problem hiding this comment.
Yes, you can remove the test and add a comment.
The primary purpose is to get the fixes from rust-lang/stdarch#920 and rust-lang/stdarch#922. The other changes included are rust-lang/stdarch#917 and rust-lang/stdarch#919.
castps128_512, castps_pd, castps_si512, castps512_ps256, castps512_ps128, castps128_ps512, castps256_ps512
castpd_ps, castpd_si512, castpd512_pd256, castpd512_pd128, castpd128_pd512, castpd256_pd512
castsi512_si128, castsi512_si256, castsi128_si512, castsi256_si512, castsi512_ps, castsi512_pd
broadcastsd_pd, broadcastss_ps, broadcastd_epi32, broadcastq_epi64
broadcast_i32x4, broadcast_i64x4, broadcast_f32x4, broadcast_f64x4
addnot: epi32, epi64
insertf32x4, insertf64x4, inserti32x4, inserti64x4
mask_blend: epi32,epi64,ps,pd
unpackhi: epi32,epi64,ps,pd
unpacklo: epi32,epi64,ps,pd