I'm opening this as a tracking issue for the SIMD intrinsics in the {std,core}::arch::wasm32 module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the #![unstable] annotations to point to this issue to direct users here if they want to use these intrinsics.
The WebAssembly simd proposal is currently in "phase 3". I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type, v128, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.
For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in crates/core_arch/src/wasm32/simd128.rs. I've got a large refactoring and sync queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.
Currently the design principles for the SIMD intrinsics are:
- Like the existing
memory_size, memory_grow and unreachable intrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.
- The type signature of each intrinsic is intended to match the textual description of each intrinsic
- Each intrinsic has
#[target_feature(enable = "simd128")] which forces them all to be unsafe
- Some gotchas for specific intrinsics are:
v128.const is exposed through a suite of const functions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.
- Instructions using lane indices, such as
v8x16_shuffle and *_{extract,replace}_lane use const generics to represent constant arguments. This is different from x86_64 which uses the older #[rustc_args_required_const] attribute.
- Shuffles are provided for
v16x8, v32x4, and v64x2 as conveniences instead of only providing v8x16_shuffle. All of them are implemented in terms of the v8x16.shuffle instruction, however.
- There is a singular
v128 type, not a type for each size of vector that intrinsics operate with
- The
extract_lane intrinsics return the value type associated with the intrinsic name, they do not all return i32 unlike the actual WebAssembly instruction. This means that we do not have extract_lane_s and extract_lane_u intrinsics because the compiler will select the appropriate one depending on the context.
It's important to note that clang has an implementation of these intrinsics in the wasm_simd128.h header. The current design of the Rust wasm32 module is different in that:
- The prefix
wasm_* isn't used.
- Only one datatype,
v128, is exposed instead of types for each size/kind of vector
- Naming can be different depending on the intrinsic. For example clang has
wasm_i16x8_load_8x8 and wasm_u16x8_load_8x8 while Rust has i16x8_load8x8_s and i16x8_load8x8_u.
Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.
How SIMD will be used
If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. #74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).
Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)
After rust-lang/stdarch#874 lands programs will simply use #[target_feature(enable = "...")] or RUSTFLAGS and everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will use v128.
Open Questions
A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:
I'm opening this as a tracking issue for the SIMD intrinsics in the
{std,core}::arch::wasm32module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the#![unstable]annotations to point to this issue to direct users here if they want to use these intrinsics.The WebAssembly simd proposal is currently in "phase 3". I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type,
v128, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in
crates/core_arch/src/wasm32/simd128.rs. I've got a large refactoring and sync queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.Currently the design principles for the SIMD intrinsics are:
memory_size,memory_growandunreachableintrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.#[target_feature(enable = "simd128")]which forces them all to beunsafev128.constis exposed through a suite ofconstfunctions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.v8x16_shuffleand*_{extract,replace}_laneuse const generics to represent constant arguments. This is different from x86_64 which uses the older#[rustc_args_required_const]attribute.v16x8,v32x4, andv64x2as conveniences instead of only providingv8x16_shuffle. All of them are implemented in terms of thev8x16.shuffleinstruction, however.v128type, not a type for each size of vector that intrinsics operate withextract_laneintrinsics return the value type associated with the intrinsic name, they do not all returni32unlike the actual WebAssembly instruction. This means that we do not haveextract_lane_sandextract_lane_uintrinsics because the compiler will select the appropriate one depending on the context.It's important to note that clang has an implementation of these intrinsics in the
wasm_simd128.hheader. The current design of the Rustwasm32module is different in that:wasm_*isn't used.v128, is exposed instead of types for each size/kind of vectorwasm_i16x8_load_8x8andwasm_u16x8_load_8x8while Rust hasi16x8_load8x8_sandi16x8_load8x8_u.Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.
How SIMD will be used
If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. #74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).
Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)
After rust-lang/stdarch#874 lands programs will simply use
#[target_feature(enable = "...")]orRUSTFLAGSand everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will usev128.Open Questions
A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:
*_load_*and*_store_*instructions. Primarily the instructions that load 64 bits (8x8, 16x4, ...) I'm unsure of on the types of their pointer arguments.v8x16_shuffleand lane managment instructions.i8x16_extract_lane_sis ok (e.g. havingi8x16_extract_lanereturningi8is all we need), same fori16x8.#[target_feature]"requires unsafe" rules for these WebAssembly intrinsics. Intrinsic likef32x4_splathave no fundamental reason they need to beunsafe. The only reason they're unsafe is because#[target_feature]is used on them to ensure that SIMD instructions are generated in LLVM.*_{any,all}_trueto returning abool